SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 07-12-June-2015, Issue , 2015, Pages 4437-4446

Associating neural word embeddings with deep image representations using Fisher Vectors

(4) Klein, Benjamin a Lev, Guy a Sadeh, Gil a Wolf, Lior a

a TEL AVIV UNIVERSITY (Israel)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; GAUSSIAN DISTRIBUTION; IMAGE ANALYSIS; IMAGE RETRIEVAL; IMAGE SEGMENTATION; LAPLACE TRANSFORMS; MAXIMUM PRINCIPLE; PATTERN RECOGNITION;

EXPECTATION - MAXIMIZATIONS; GAUSSIAN MIXTURE MODEL; IMAGE ANNOTATION; IMAGE REPRESENTATIONS; LAPLACIAN DISTRIBUTION; LAPLACIAN MIXTURE MODELS; LOG LIKELIHOOD; STATE OF THE ART;

VECTORS;

EID: 84959196607 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2015.7299073 Document Type: Conference Paper

Times cited : (347)

References (48)

1
- 0142166851
- A neural probabilistic language model
- Mar
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137-1155, Mar. 2003
- (2003) J. Mach. Learn. Res , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Janvin, C.⁴

2
- 84898420173
- The devil is in the details: An evaluation of recent feature encoding methods
- K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference, 2011
- (2011) British Machine Vision Conference
- Chatfield, K.¹ Lempitsky, V.² Vedaldi, A.³ Zisserman, A.⁴

3
- 84959216393
- Textual similarity with a bag-of-embedded-words model
- ACM
- S. Clinchant and F. Perronnin. Textual similarity with a bag-of-embedded-words model. In Proceedings of the 2013 Conference on the Theory of Information Retrieval, page 25. ACM, 2013
- (2013) Proceedings of the 2013 Conference on the Theory of Information Retrieval , pp. 25
- Clinchant, S.¹ Perronnin, F.²

4
- 0002629270
- Maximum likelihood from incomplete data via the em algorithm
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1-38, 1977
- (1977) Journal of the Royal Statistical Society. Series B (Methodological , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

5
- 84944046597
- arXiv preprint arXiv:1411. 4389v2
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411. 4389v2, 2014
- (2014) Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

6
- 33645837850
- On the multivariate laplace distribution
- T. Eltoft, T. Kim, and T.-W. Lee. On the multivariate laplace distribution. Signal Processing Letters, IEEE, 13(5):300-303, 2006
- (2006) Signal Processing Letters, IEEE , vol.13 , Issue.5 , pp. 300-303
- Eltoft, T.¹ Kim, T.² Lee, T.-W.³

7
- 78149311145
- Every picture tells a story: Generating sentences from images
- Springer
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In Computer Vision-ECCV 2010, pages 15-29. Springer, 2010
- (2010) Computer Vision-ECCV 2010 , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

8
- 84906343066
- arXiv preprint arXiv:1311. 2524
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv preprint arXiv:1311. 2524, 2013
- (2013) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

9
- 84890466217
- Improving neural networks by preventing co-adaptation of feature detectors
- abs/1207. 0580
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207. 0580, 2012
- (2012) CoRR
- Hinton, G.E.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

10
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artif. Intell. Res. (JAIR), 47:853-899, 2013
- (2013) J. Artif. Intell. Res. (JAIR) , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

11
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Y. A. L. M. Hodosh and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2014
- (2014) Transactions of the Association for Computational Linguistics
- Hodosh, P.Y.A.L.M.¹ Hockenmaier, J.²

12
- 0000107975
- Relations between two sets of variates
- H. Hotelling. Relations between two sets of variates. Biometrika, pages 321-377, 1936
- (1936) Biometrika , pp. 321-377
- Hotelling, H.¹

13
- 85162303334
- Heavy-tailed distances for gradient based image descriptors
- Y. Jia and T. Darrell. Heavy-tailed distances for gradient based image descriptors. In Advances in Neural Information Processing Systems, pages 397-405, 2011
- (2011) Advances in Neural Information Processing Systems , pp. 397-405
- Jia, Y.¹ Darrell, T.²

14
- 84942676733
- Deep visual-semantic alignments for generating image descriptions
- Computer Science Department, Stanford University
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. Technical report, Computer Science Department, Stanford University, 2014
- (2014) Technical Report
- Karpathy, A.¹ Fei-Fei, L.²

15
- 84959252592
- arXiv preprint arXiv:1406. 5679
- A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. arXiv preprint arXiv:1406. 5679, 2014
- (2014) Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

16
- 5044233274
- PCA-sift: A more distinctive representation for local image descriptors
- IEEE
- Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II-506. IEEE, 2004
- (2004) Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on , vol.2 , pp. II-506
- Ke, Y.¹ Sukthankar, R.²

17
- 84944113729
- arXiv preprint arXiv:1411. 2539
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411. 2539, 2014
- (2014) Unifying Visual-semantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

18
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- IEEE
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1601-1608. IEEE, 2011
- (2011) Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on , pp. 1601-1608
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

19
- 84878189119
- Collective generation of natural image descriptions
- Association for Computational Linguistics
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 359-368. Association for Computational Linguistics, 2012
- (2012) Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 , pp. 359-368
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

20
- 84919829999
- Distributed representations of sentences and documents
- JMLR. org
- Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, volume 32 of JMLR Proceedings, pages 1188-1196. JMLR. org, 2014
- (2014) Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, Volume 32 of JMLR Proceedings , pp. 1188-1196
- Le, Q.V.¹ Mikolov, T.²

21
- 0032203257
- Gradientbased learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

22
- 84863895135
- Independent component analysis: Theory and applications [book review]
- T.-W. Lee. Independent component analysis: theory and applications [book review]. IEEE Transactions on Neural Networks, 10(4):982-982, 1999
- (1999) IEEE Transactions on Neural Networks , vol.10 , Issue.4 , pp. 982
- Lee, T.-W.¹

23
- 84862279067
- Composing simple image descriptions using web-scale ngrams
- Association for Computational Linguistics
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale ngrams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 220-228. Association for Computational Linguistics, 2011
- (2011) Proceedings of the Fifteenth Conference on Computational Natural Language Learning , pp. 220-228
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

24
- 84906493406
- Microsoft coco: Common objects in context
- D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors,Springer International Publishing
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollr, and C. Zitnick. Microsoft coco: Common objects in context. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision ECCV 2014, volume 8693 of Lecture Notes in Computer Science, pages 740-755. Springer International Publishing, 2014
- (2014) Computer Vision ECCV 2014, Volume 8693 of Lecture Notes in Computer Science , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollr, P.⁷ Zitnick, C.⁸

25
- 0033284915
- Object recognition from local scale-invariant features
- Ieee
- D. G. Lowe. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, pages 1150-1157. Ieee, 1999
- (1999) Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on , vol.2 , pp. 1150-1157
- Lowe, D.G.¹

26
- 84951072975
- arXiv preprint arXiv:1410. 1090
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410. 1090, 2014
- (2014) Explain Images with Multimodal Recurrent Neural Networks
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

27
- 85083951332
- Efficient estimation of word representations in vector space
- abs/1301. 3781
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301. 3781, 2013
- (2013) CoRR
- Mikolov, T.¹ Chen, K.² Corrado, G.³ Dean, J.⁴

28
- 84898956512
- Distributed representations of words and phrases and their compositionality
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111-3119, 2013
- (2013) Advances in Neural Information Processing Systems , pp. 3111-3119
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

29
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- Association for Computational Linguistics
- M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, and H. Daumé III. Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 747-756. Association for Computational Linguistics, 2012
- (2012) Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics , pp. 747-756
- Mitchell, M.¹ Han, X.² Dodge, J.³ Mensch, A.⁴ Goyal, A.⁵ Berg, A.⁶ Yamaguchi, K.⁷ Berg, T.⁸ Stratos, K.⁹ Daumé, H.¹⁰

30
- 34547970628
- Three new graphical models for statistical language modelling
- New York, NY, USA,ACM
- A. Mnih and G. Hinton. Three new graphical models for statistical language modelling. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 641-648, New York, NY, USA, 2007. ACM
- (2007) Proceedings of the 24th International Conference on Machine Learning, ICML '07 , pp. 641-648
- Mnih, A.¹ Hinton, G.²

31
- 0000273048
- Annealed importance sampling
- R. Neal. Annealed importance sampling. Statistics and Computing, 11(2):125-139, 2001
- (2001) Statistics and Computing , vol.11 , Issue.2 , pp. 125-139
- Neal, R.¹

32
- 84906510060
- Action recognition with stacked fisher vectors
- Springer
- X. Peng, C. Zou, Y. Qiao, and Q. Peng. Action recognition with stacked fisher vectors. In Computer Vision-ECCV 2014, pages 581-595. Springer, 2014
- (2014) Computer Vision-ECCV 2014 , pp. 581-595
- Peng, X.¹ Zou, C.² Qiao, Y.³ Peng, Q.⁴

33
- 34948815101
- Fisher kernels on visual vocabularies for image categorization
- IEEE
- F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image categorization. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on, pages 1-8. IEEE, 2007
- (2007) Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on , pp. 1-8
- Perronnin, F.¹ Dance, C.²

34
- 77955992063
- Large-scale image retrieval with compressed fisher vectors
- IEEE
- F. Perronnin, Y. Liu, J. Sánchez, and H. Poirier. Large-scale image retrieval with compressed fisher vectors. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3384-3391. IEEE, 2010
- (2010) Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , pp. 3384-3391
- Perronnin, F.¹ Liu, Y.² Sánchez, J.³ Poirier, H.⁴

35
- 78149348137
- Improving the fisher kernel for large-scale image classification
- Springer
- F. Perronnin, J. Sánchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In Computer Vision-ECCV 2010, pages 143-156. Springer, 2010
- (2010) Computer Vision-ECCV 2010 , pp. 143-156
- Perronnin, F.¹ Sánchez, J.² Mensink, T.³

36
- 85090348677
- Collecting image annotations using amazon's mechanical turk
- Association for Computational Linguistics
- C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 139-147. Association for Computational Linguistics, 2010
- (2010) Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk , pp. 139-147
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

37
- 84883487458
- Image classification with the fisher vector: Theory and practice
- J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. International journal of computer vision, 105(3):222-245, 2013
- (2013) International Journal of Computer Vision , vol.105 , Issue.3 , pp. 222-245
- Sánchez, J.¹ Perronnin, F.² Mensink, T.³ Verbeek, J.⁴

38
- 84898428370
- Fisher vector faces in the wild
- K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman. Fisher vector faces in the wild. In Proc. BMVC, volume 1, page 7, 2013
- (2013) Proc. BMVC , vol.1 , pp. 7
- Simonyan, K.¹ Parkhi, O.M.² Vedaldi, A.³ Zisserman, A.⁴

39
- 84897374881
- Deep fisher networks for large-scale image classification
- K. Simonyan, A. Vedaldi, and A. Zisserman. Deep fisher networks for large-scale image classification. In Advances in neural information processing systems, pages 163-171, 2013
- (2013) Advances in Neural Information Processing Systems , pp. 163-171
- Simonyan, K.¹ Vedaldi, A.² Zisserman, A.³

40
- 84933585162
- Very deep convolutional networks for large-scale image recognition
- abs/1409. 1556
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409. 1556, 2014
- (2014) CoRR
- Simonyan, K.¹ Zisserman, A.²

41
- 33745938597
- Discovering objects and their location in images
- IEEE
- J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering objects and their location in images. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 370-377. IEEE, 2005
- (2005) Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on , vol.1 , pp. 370-377
- Sivic, J.¹ Russell, B.C.² Efros, A.A.³ Zisserman, A.⁴ Freeman, W.T.⁵

42
- 77955998009
- Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
- IEEE
- R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 966-973. IEEE, 2010
- (2010) Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , pp. 966-973
- Socher, R.¹ Fei-Fei, L.²

43
- 84928030723
- Grounded compositional semantics for finding and describing images with sentences
- R. Socher, Q. Le, C. Manning, and A. Ng. Grounded compositional semantics for finding and describing images with sentences. In NIPS Deep Learning Workshop, 2013
- (2013) NIPS Deep Learning Workshop
- Socher, R.¹ Le, Q.² Manning, C.³ Ng, A.⁴

44
- 80053438267
- Parsing natural scenes and natural language with recursive neural networks
- R. Socher, C. C. Lin, A. Y. Ng, and C. D. Manning. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In Proceedings of the 26th International Conference on Machine Learning (ICML), 2011
- (2011) Proceedings of the 26th International Conference on Machine Learning (ICML
- Socher, R.¹ Lin, C.C.² Ng, A.Y.³ Manning, C.D.⁴

45
- 84911395964
- Deep fisher kernels-end to end learning of the fisher kernel gmm parameters
- IEEE
- V. Sydorov, M. Sakurada, and C. H. Lampert. Deep fisher kernels-end to end learning of the fisher kernel gmm parameters. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2014
- (2014) Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
- Sydorov, V.¹ Sakurada, M.² Lampert, C.H.³

46
- 0001318292
- Canonical ridge and econometrics of joint production
- H. Vinod. Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2):147-166, 1976
- (1976) Journal of Econometrics , vol.4 , Issue.2 , pp. 147-166
- Vinod, H.¹

47
- 84939821075
- arXiv preprint arXiv:1411. 4555
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411. 4555, 2014
- (2014) Show and Tell: A Neural Image Caption Generator
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

48
- 84898772194
- Learning the visual interpretation of sentences
- IEEE
- C. L. Zitnick, D. Parikh, and L. Vanderwende. Learning the visual interpretation of sentences. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1681-1688. IEEE, 2013.
- (2013) Computer Vision (ICCV), 2013 IEEE International Conference on , pp. 1681-1688
- Zitnick, C.L.¹ Parikh, D.² Vanderwende, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.