SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2015 International Conference on Computer Vision, ICCV 2015, Issue , 2015, Pages 2668-2676

Common subspace for model and similarity: Phrase learning for caption generation from images

(4) Ushiku, Yoshitaka a Yamaguchi, Masataka a Mukuta, Yusuke a Harada, Tatsuya a

a UNIVERSITY OF TOKYO (Japan)

Author keywords

[No Author keywords available]

Indexed keywords

NATURAL LANGUAGE PROCESSING SYSTEMS; SAMPLING; VECTORS;

FEATURE VECTORS; INPUT IMAGE; LEARNING METHODS; NATURAL LANGUAGE PROCESSING; SINGLE WORDS; TRAINING SAMPLE; VISUAL COMPLEXITY;

COMPUTER VISION;

EID: 84973861187 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2015.306 Document Type: Conference Paper

Times cited : (57)

References (48)

1
- 77951458444
- An online algorithm for large scale image similarity learning
- 3
- G. Chechik, U. Shalit, V. Sharma, and S. Bengio. An online algorithm for large scale image similarity learning. In NIPS, 2009. 3
- (2009) NIPS
- Chechik, G.¹ Shalit, U.² Sharma, V.³ Bengio, S.⁴

2
- 84957029470
- Mind's eye: A recurrent visual representation for image caption generation
- 3, 7
- X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015. 3, 7
- (2015) CVPR
- Chen, X.¹ Zitnick, C.L.²

3
- 84455207551
- Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
- 6
- G. Doddington. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In HLT, 2002. 6
- (2002) HLT
- Doddington, G.¹

4
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- 3, 7, 8
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015. 3, 7, 8
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

5
- 84959250180
- From captions to visual concepts and back
- 3, 7
- H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015. 3, 7
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.N.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰ Zitnick, C.L.¹¹ Zweig, G.¹²

6
- 80052017343
- Every picture tells a story: Generating sentences from images
- 2
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010. 2
- (2010) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

7
- 84887839738
- Phrasal recognition
- 1, 3
- A. Farhadi and M. A. Sadeghi. Phrasal recognition. PAMI, 35 (12): 2854-65, 2013. 1, 3
- (2013) PAMI , vol.35 , Issue.12 , pp. 2854-2865
- Farhadi, A.¹ Sadeghi, M.A.²

8
- 38049183286
- The iapr tc-12 benchmark: A new evaluation resource for visual information systems
- 5, 10
- M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, 2006. 5, 10
- (2006) International Workshop OntoImage
- Grubinger, M.¹ Clough, P.² Müller, H.³ Deselaers, T.⁴

9
- 84973931408
- From image annotation to image description
- 5, 6, 13
- A. Gupta and P. Mannem. From image annotation to image description. In ICONIP, 2012. 5, 6, 13
- (2012) ICONIP
- Gupta, A.¹ Mannem, P.²

10
- 85059866463
- Choosing linguistics over vision to describe images
- 1, 3, 5, 7
- A. Gupta, Y. Verma, and C. V. Jawahar. Choosing linguistics over vision to describe images. In AAAI, 2012. 1, 3, 5, 7
- (2012) AAAI
- Gupta, A.¹ Verma, Y.² Jawahar, C.V.³

11
- 0031573117
- Long short-term memory
- 3, 7
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9 (8): 1735-1780, 1997. 3, 7
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

12
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 2
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47: 853-899, 2013. 2
- (2013) JAIR , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

13
- 84913555165
- arXiv preprint, (1408. 5093) 5
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint, (1408. 5093), 2014. 5
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

14
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- 3, 7, 8
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 3, 7, 8
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

15
- 84944113729
- Unifying visualsemantic embeddings with multimodal neural language models
- 3
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visualsemantic embeddings with multimodal neural language models. In NIPS, 2014. 3
- (2014) NIPS
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

16
- 85146417759
- Accurate unlexicalized parsing
- 5
- D. Klein and C. D. Manning. Accurate unlexicalized parsing. In ACL, 2003. 5
- (2003) ACL
- Klein, D.¹ Manning, C.D.²

17
- 84876231242
- Imagenet classification with deep convolutional neural networks
- 1, 3, 5, 7
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 3, 5, 7
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

18
- 80052901011
- Baby talk: Understanding and generating image descriptions
- 1, 2, 7, 8
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating image descriptions. In CVPR, 2011. 1, 2, 7, 8
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

19
- 84907331257
- Generalizing image captions for image-text parallel corpus
- 7
- P. Kuznetsova, V. Ordonez, A. Berg, T. Berg, Y. Choi, and S. Brook. Generalizing image captions for image-text parallel corpus. In ACL, 2013. 7
- (2013) ACL
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.³ Berg, T.⁴ Choi, Y.⁵ Brook, S.⁶

20
- 84878189119
- Collective generation of natural image descriptions
- 1, 2, 5, 7
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, 2012. 1, 2, 5, 7
- (2012) ACL
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

21
- 52149112996
- Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments
- 7
- A. Lavie and A. Agarwal. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In ACL WMT, 2007. 7
- (2007) ACL WMT
- Lavie, A.¹ Agarwal, A.²

22
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- 1, 2, 3
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In CoNLL, 2011. 1, 2, 3
- (2011) CoNLL
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

23
- 84906505935
- arXiv preprint 1405 0312. 5
- T.-y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft coco: Common objects in context. arXiv preprint, 1405. 0312, 2014. 5
- (2014) Microsoft Coco: Common Objects in Context
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollar, P.⁷ Zitnick, C.L.⁸

24
- 3042535216
- Distinctive image features from scale-invariant keypoints
- 5, 10
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, (2): 91-110, 2004. 5, 10
- (2004) IJCV , Issue.2 , pp. 91-110
- Lowe, D.G.¹

25
- 34948830130
- Semantic hierarchies for visual object recognition
- 2
- M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition. In CVPR, 2007. 2
- (2007) CVPR
- Marszalek, M.¹ Schmid, C.²

26
- 84884545084
- Technical report, LEAR-INRIA and TVPA-XRCE 2, 3, 11
- T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Large scale metric learning for distance-based image classification. Technical report, LEAR-INRIA and TVPA-XRCE, 2012. 2, 3, 11
- (2012) Large Scale Metric Learning for Distance-based Image Classification
- Mensink, T.¹ Verbeek, J.² Perronnin, F.³ Csurka, G.⁴

27
- 84883488616
- Metric learning for large scale image classification: Generalizing to new classes at near-zero cost
- 2, 3, 11
- T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV, 2012. 2, 3, 11
- (2012) ECCV
- Mensink, T.¹ Verbeek, J.² Perronnin, F.³ Csurka, G.⁴

28
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- 1, 2, 3, 5
- M. Mitchell, J. Dodge, A. Goyal, K. Yamaguchi, K. Stratos, X. Han, A. Mensch, A. Berg, T. Berg, and H. Daumé III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012. 1, 2, 3, 5
- (2012) EACL
- Mitchell, M.¹ Dodge, J.² Goyal, A.³ Yamaguchi, K.⁴ Stratos, K.⁵ Han, X.⁶ Mensch, A.⁷ Berg, A.⁸ Berg, T.⁹ Daumé, H.¹⁰

29
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- 2, 5, 7
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011. 2, 5, 7
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

30
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- 6
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002. 6
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

31
- 79959771606
- Improving the fisher kernel for large-scale image classification
- 5
- F. Perronnin, J. Sánchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In ECCV, 2010. 5
- (2010) ECCV
- Perronnin, F.¹ Sánchez, J.² Mensink, T.³

32
- 0026899240
- Acceleration of stochastic approximation by averaging
- 4
- B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30 (4): 838-855, 1992. 4
- (1992) SIAM Journal on Control and Optimization , vol.30 , Issue.4 , pp. 838-855
- Polyak, B.T.¹ Juditsky, A.B.²

33
- 85090348677
- Collecting image annotations using amazon's mechanical turk
- 5
- C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In Proceedings of NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, 2010. 5
- (2010) Proceedings of NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

34
- 84947041871
- Imagenet large scale visual recognition challenge
- 5
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, 2015. 5
- (2015) IJCV
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

35
- 80052889458
- Recognition using visual phrases
- 1, 3
- M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011. 1, 3
- (2011) CVPR
- Sadeghi, M.A.¹ Farhadi, A.²

36
- 80052905403
- Learning to share visual appearance for multiclass object detection
- 2
- R. Salakhutdinov, A. Torralba, and J. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011. 2
- (2011) CVPR
- Salakhutdinov, R.¹ Torralba, A.² Tenenbaum, J.³

37
- 80052885179
- High-dimensional signature compression for large-scale image classification
- 1, 3
- J. Sánchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In CVPR, 2011. 1, 3
- (2011) CVPR
- Sánchez, J.¹ Perronnin, F.²

38
- 0031268931
- Bidirectional recurrent neural networks
- 7
- M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. TSP, 45 (11): 2673-2681, 1997. 7
- (1997) TSP , vol.45 , Issue.11 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.K.²

39
- 84943761635
- Very deep convolutional networks for large-scale image recognition
- 5, 7
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In CVPR, 2015. 5, 7
- (2015) CVPR
- Simonyan, K.¹ Zisserman, A.²

40
- 84937522268
- Going deeper with convolutions
- 7
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015. 7
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

41
- 84871392832
- Efficient image annotation for automatic sentence generation
- 1, 3, 4, 5, 6, 7, 11, 12
- Y. Ushiku, T. Harada, and Y. Kuniyoshi. Efficient image annotation for automatic sentence generation. In ACMMM, 2012. 1, 3, 4, 5, 6, 7, 11, 12
- (2012) ACMMM
- Ushiku, Y.¹ Harada, T.² Kuniyoshi, Y.³

42
- 25844477556
- Less: A model-based classifier for sparse subspaces
- 2, 3
- C. J. Veenman and D. M. Tax. Less: A model-based classifier for sparse subspaces. PAMI, 27 (9): 1496-500, 2005. 2, 3
- (2005) PAMI , vol.27 , Issue.9 , pp. 1496-1500
- Veenman, C.J.¹ Tax, D.M.²

43
- 84884963254
- Generating image descriptions using semantic similarities in the output space
- 1, 3, 5, 7
- Y. Verma, A. Gupta, P. Mannem, and C. Jawahar. Generating image descriptions using semantic similarities in the output space. In Proceedings of CVPR Workshop on Language for Vision, 2013. 1, 3, 5, 7
- (2013) Proceedings of CVPR Workshop on Language for Vision
- Verma, Y.¹ Gupta, A.² Mannem, P.³ Jawahar, C.⁴

44
- 84946747440
- Show and tell: A neural image caption generator
- 3, 7, 8
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 3, 7, 8
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

45
- 33749550361
- Distance metric learning for large margin nearest neighbor classification
- 3, 7
- K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2006. 3, 7
- (2006) NIPS
- Weinberger, K.Q.¹ Blitzer, J.² Saul, L.K.³

46
- 77955654853
- Large scale image annotation: Learning to rank with joint word-image embeddings
- 4, 11
- J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings. Machine Learning, 81: 21-35, 2010. 4, 11
- (2010) Machine Learning , vol.81 , pp. 21-35
- Weston, J.¹ Bengio, S.² Usunier, N.³

47
- 84867117593
- Wsabie: Scaling up to large vocabulary image annotation
- 2, 3, 4, 11
- J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, 2011. 2, 3, 4, 11
- (2011) IJCAI
- Weston, J.¹ Bengio, S.² Usunier, N.³

48
- 80053258778
- Corpus-guided sentence generation of natural images
- 1, 2, 5, 7, 8
- Y. Yang, C. L. Teo, H. Daumé III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, 2011. 1, 2, 5, 7, 8
- (2011) EMNLP
- Yang, Y.¹ Teo, C.L.² Daumé, H.³ Aloimonos, Y.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.