SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 4555-4564

Natural language object retrieval

(6) Hu, Ronghang a Xu, Huazhe b Rohrbach, Marcus a,c Feng, Jiashi d Saenko, Kate e Darrell, Trevor a

a UNIVERSITY OF CALIFORNIA (United States)

b TSINGHUA UNIVERSITY (China)

c ICSI (United States)

d NATIONAL UNIVERSITY OF SINGAPORE (Singapore)

e UNIVERSITY OF MASSACHUSETTS (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; IMAGE PROCESSING; KNOWLEDGE MANAGEMENT; PATTERN RECOGNITION;

CONTEXTUAL INFORMATION; GLOBAL INFORMATIONS; LINGUISTIC KNOWLEDGE; LOCAL IMAGE DESCRIPTORS; NATURAL LANGUAGE QUERIES; SPATIAL CONFIGURATION; SPATIAL INFORMATIONS; TEXT-BASED IMAGE RETRIEVALS;

IMAGE RETRIEVAL;

EID: 84986305787 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.493 Document Type: Conference Paper

Times cited : (654)

References (33)

1
- 84898444333
- Multiple queries for large scale specific object retrieval
- R. Arandjelovic and A. Zisserman. Multiple queries for large scale specific object retrieval. In Proceedings of the British Machine Vision Conference (BMVC), pages 1-11, 2012.
- (2012) Proceedings of the British Machine Vision Conference (BMVC) , pp. 1-11
- Arandjelovic, R.¹ Zisserman, A.²

2
- 84887378604
- Fast, accurate detection of 100,000 object classes on a single machine
- IEEE
- T. Dean, M. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, J. Yagnik, et al. Fast, accurate detection of 100,000 object classes on a single machine. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 1814-1821. IEEE, 2013.
- (2013) Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on , pp. 1814-1821
- Dean, T.¹ Ruzon, M.² Segal, M.³ Shlens, J.⁴ Vijayanarasimhan, S.⁵ Yagnik, J.⁶

3
- 85198028989
- Imagenet: A large-scale hierarchical image database
- IEEE
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255. IEEE, 2009.
- (2009) Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

4
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2625-2634, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 2625-2634
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

5
- 77649188328
- The segmented and annotated iapr tc-12 benchmark
- H. J. Escalante, C. A. Hernández, J. A. Gonzalez, A. López-López, M. Montes, E. F. Morales, L. E. Sucar, L. Villasenor, and M. Grubinger. The segmented and annotated iapr tc-12 benchmark. Computer Vision and Image Understanding, 114(4):419-428, 2010.
- (2010) Computer Vision and Image Understanding , vol.114 , Issue.4 , pp. 419-428
- Escalante, H.J.¹ Hernández, C.A.² Gonzalez, J.A.³ López-López, A.⁴ Montes, M.⁵ Morales, E.F.⁶ Sucar, L.E.⁷ Villasenor, L.⁸ Grubinger, M.⁹

6
- 84898958665
- Devise: A deep visual-semantic embedding model
- A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, pages 2121-2129, 2013.
- (2013) Advances in Neural Information Processing Systems , pp. 2121-2129
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

7
- 84986248789
- Fast r-cnn
- R. Girshick. Fast R-CNN. In International Conference on Computer Vision (ICCV), 2015.
- (2015) International Conference on Computer Vision (ICCV)
- Girshick, R.¹

8
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- IEEE
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 580-587. IEEE, 2014.
- (2014) Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on , pp. 580-587
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

9
- 38049183286
- The iapr tc-12 benchmark: A new evaluation resource for visual information systems
- M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, pages 13-23, 2006.
- (2006) International Workshop OntoImage , pp. 13-23
- Grubinger, M.¹ Clough, P.² Müller, H.³ Deselaers, T.⁴

10
- 85131224768
- Open-vocabulary object retrieval
- S. Guadarrama, E. Rodner, K. Saenko, N. Zhang, R. Farrell, J. Donahue, and T. Darrell. Open-vocabulary object retrieval. In Robotics: Science and Systems, 2014.
- (2014) Robotics: Science and Systems
- Guadarrama, S.¹ Rodner, E.² Saenko, K.³ Zhang, N.⁴ Farrell, R.⁵ Donahue, J.⁶ Darrell, T.⁷

11
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

12
- 84924803045
- Lsda: Large scale detection through adaptation
- J. Hoffman, S. Guadarrama, E. S. Tzeng, R. Hu, J. Donahue, R. Girshick, T. Darrell, and K. Saenko. Lsda: Large scale detection through adaptation. In Advances in Neural Information Processing Systems, pages 3536-3544, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 3536-3544
- Hoffman, J.¹ Guadarrama, S.² Tzeng, E.S.³ Hu, R.⁴ Donahue, J.⁵ Girshick, R.⁶ Darrell, T.⁷ Saenko, K.⁸

13
- 33845594193
- Learning distance metrics with contextual constraints for image retrieval
- IEEE
- S. C. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 2072-2078. IEEE, 2006.
- (2006) Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on , vol.2 , pp. 2072-2078
- Hoi, S.C.¹ Liu, W.² Lyu, M.R.³ Ma, W.-Y.⁴

14
- 85030448950
- arXiv preprint arXiv:1603.06180
- R. Hu, M. Rohrbach, and T. Darrell. Segmentation from natural language expressions. arXiv preprint arXiv:1603.06180, 2016.
- (2016) Segmentation from Natural Language Expressions
- Hu, R.¹ Rohrbach, M.² Darrell, T.³

15
- 84986302997
- arXiv preprint arXiv:1511.04164
- R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell. Natural language object retrieval. arXiv preprint arXiv:1511.04164, 2015.
- (2015) Natural Language Object Retrieval
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

16
- 84913580146
- Caffe: Convolutional architecture for fast feature embedding
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. B. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM Multimedia, volume 2, page 4, 2014.
- (2014) ACM Multimedia , vol.2 , pp. 4
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.B.⁶ Guadarrama, S.⁷ Darrell, T.⁸

17
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Karpathy, A.¹ Fei-Fei, L.²

18
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. In Advances in Neural Information Processing Systems (NIPS), 2014.
- (2014) Advances in Neural Information Processing Systems (NIPS)
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

19
- 84943540775
- Referitgame: Referring to objects in photographs of natural scenes
- S. Kazemzadeh, V. Ordonez, M. Matten, and T. L. Berg. Referitgame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 787-798, 2014.
- (2014) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 787-798
- Kazemzadeh, S.¹ Ordonez, V.² Matten, M.³ Berg, T.L.⁴

20
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. Transactions of the Association for Computational Linguistics (TACL), 2015.
- (2015) Transactions of the Association for Computational Linguistics (TACL)
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

21
- 84944130628
- arXiv preprint arXiv:1411.7399
- B. Klein, G. Lev, G. Sadeh, and L. Wolf. Fisher vectors derived from hybrid Gaussian-laplacian mixture models for image annotation. arXiv preprint arXiv:1411.7399, 2014.
- (2014) Fisher Vectors Derived from Hybrid Gaussian-laplacian Mixture Models for Image Annotation
- Klein, B.¹ Lev, G.² Sadeh, G.³ Wolf, L.⁴

22
- 84911370987
- What are you talking about? Text-to-image coreference
- IEEE
- C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-to-image coreference. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 3558-3565. IEEE, 2014.
- (2014) Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on , pp. 3558-3565
- Kong, C.¹ Lin, D.² Bansal, M.³ Urtasun, R.⁴ Fidler, S.⁵

23
- 84906493406
- Microsoft coco: Common objects in context
- Springer
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014, pages 740-755. Springer, 2014.
- (2014) Computer Vision-ECCV 2014 , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

24
- 84986260074
- Generation and comprehension of unambiguous object descriptions
- J. Mao, J. Huang, A. Toshev, O. Camburu, A. Yuille, and K. Murphy. Generation and comprehension of unambiguous object descriptions. Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, 2016.
- (2016) Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on
- Mao, J.¹ Huang, J.² Toshev, A.³ Camburu, O.⁴ Yuille, A.⁵ Murphy, K.⁶

25
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In Proceedings of the International Conference on Learning Representations, 2015.
- (2015) Proceedings of the International Conference on Learning Representations
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

26
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-tosentence models
- B. Plummer, L. Wang, C. Cervantes, J. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-tosentence models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
- (2015) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Plummer, B.¹ Wang, L.² Cervantes, C.³ Caicedo, J.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

27
- 84962816362
- Image question answering: A visual semantic embedding model and a new dataset
- M. Ren, R. Kiros, and R. Zemel. Image question answering: A visual semantic embedding model and a new dataset. In Advances in Neural Information Processing Systems (NIPS), 2015.
- (2015) Advances in Neural Information Processing Systems (NIPS)
- Ren, M.¹ Kiros, R.² Zemel, R.³

28
- 84986327251
- arXiv preprint arXiv:1511.03745
- A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele. Grounding of textual phrases in images by reconstruction. arXiv preprint arXiv:1511.03745, 2015.
- (2015) Grounding of Textual Phrases in Images by Reconstruction
- Rohrbach, A.¹ Rohrbach, M.² Hu, R.³ Darrell, T.⁴ Schiele, B.⁵

29
- 84945944033
- Imagenet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, pages 1-42, 2014.
- (2014) International Journal of Computer Vision , pp. 1-42
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰

30
- 84925410541
- arXiv preprint arXiv:1409.1556
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

31
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156-3164, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

32
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning (ICML), 2015.
- (2015) Proceedings of the International Conference on Machine Learning (ICML)
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

33
- 84906489617
- Edge boxes: Locating object proposals from edges
- Springer, 2014.
- C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision (ECCV), pages 391-405. Springer, 2014.
- Proceedings of the European Conference on Computer Vision (ECCV) , pp. 391-405
- Zitnick, C.L.¹ Dollár, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.