-
3
-
-
85161984064
-
Simultaneous object detection and ranking with weak supervision
-
Blaschko, M., Vedaldi, A., Zisserman, A.: Simultaneous object detection and ranking with weak supervision. In: Advances in Neural Information Processing Systems (NIPS), pp. 235–243 (2010)
-
(2010)
Advances in Neural Information Processing Systems (NIPS)
, pp. 235-243
-
-
Blaschko, M.1
Vedaldi, A.2
Zisserman, A.3
-
7
-
-
72449136144
-
Imagenet: A large-scale hierarchical image database
-
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
-
(2009)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.J.4
Li, K.5
Fei-Fei, L.6
-
9
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
10
-
-
77951298115
-
The Pascal Visual Object Classes (VOC) challenge
-
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The Pascal Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. (IJCV) 88(2), 303–338 (2010)
-
(2010)
Int. J. Comput. Vis. (IJCV)
, vol.88
, Issue.2
, pp. 303-338
-
-
Everingham, M.1
Van Gool, L.2
Williams, C.K.3
Winn, J.4
Zisserman, A.5
-
14
-
-
84906484732
-
Improving imagesentence embeddings using large weakly annotated photo collections
-
Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.), Springer, Switzerland
-
Gong, Y., Wang, L., Hodosh, M., Hockenmaier, J., Lazebnik, S.: Improving imagesentence embeddings using large weakly annotated photo collections. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 529–545. Springer, Switzerland (2014)
-
(2014)
ECCV 2014, Part IV. LNCS
, vol.8692
, pp. 529-545
-
-
Gong, Y.1
Wang, L.2
Hodosh, M.3
Hockenmaier, J.4
Lazebnik, S.5
-
15
-
-
85131224768
-
Open-vocabulary object retrieval
-
Guadarrama, S., Rodner, E., Saenko, K., Zhang, N., Farrell, R., Donahue, J., Darrell, T.: Open-vocabulary object retrieval. In: Robotics: Science and Systems (2014)
-
(2014)
Robotics: Science and Systems
-
-
Guadarrama, S.1
Rodner, E.2
Saenko, K.3
Zhang, N.4
Farrell, R.5
Donahue, J.6
Darrell, T.7
-
16
-
-
84973911419
-
Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification
-
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
18
-
-
84986305787
-
Natural language object retrieval
-
Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., Darrell, T.: Natural language object retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Hu, R.1
Xu, H.2
Rohrbach, M.3
Feng, J.4
Saenko, K.5
Darrell, T.6
-
20
-
-
84913580146
-
Caffe: Convolutional architecture for fast feature embedding
-
ACM
-
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
-
(2014)
Proceedings of the ACM International Conference on Multimedia
, pp. 675-678
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
21
-
-
84986312327
-
-
arXiv:1506. 06272
-
Jin, J., Fu, K., Cui, R., Sha, F., Zhang, C.: Aligning where to see and what to tell: image caption with region-based attention and scene factorization. arXiv:1506. 06272 (2015)
-
(2015)
Aligning Where to See and What to Tell: Image Caption with Region-Based Attention and Scene Factorization
-
-
Jin, J.1
Fu, K.2
Cui, R.3
Sha, F.4
Zhang, C.5
-
22
-
-
84959233256
-
Image retrieval using scene graphs
-
Johnson, J., Krishna, R., Stark, M., Li, L.J., Shamma, D., Bernstein, M., Fei-Fei, L.: Image retrieval using scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3668–3678 (2015)
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 3668-3678
-
-
Johnson, J.1
Krishna, R.2
Stark, M.3
Li, L.J.4
Shamma, D.5
Bernstein, M.6
Fei-Fei, L.7
-
23
-
-
84906344543
-
Efficient image and video co-localization with Frank-Wolfe algorithm
-
Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.), Springer, Heidelberg
-
Joulin, A., Tang, K., Fei-Fei, L.: Efficient image and video co-localization with Frank-Wolfe algorithm. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 253–268. Springer, Heidelberg (2014)
-
(2014)
ECCV 2014, Part VI. LNCS
, vol.8694
, pp. 253-268
-
-
Joulin, A.1
Tang, K.2
Fei-Fei, L.3
-
26
-
-
84943540775
-
Referit game: Referring to objects in photographs of natural scenes
-
Kazemzadeh, S., Ordonez, V., Matten, M., Berg, T.L.: Referit game: referring to objects in photographs of natural scenes. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014)
-
(2014)
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
-
-
Kazemzadeh, S.1
Ordonez, V.2
Matten, M.3
Berg, T.L.4
-
28
-
-
84911370987
-
What are you talking about? Text-to-image coreference
-
IEEE
-
Kong, C., Lin, D., Bansal, M., Urtasun, R., Fidler, S.: What are you talking about? Text-to-image coreference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3558–3565. IEEE (2014)
-
(2014)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 3558-3565
-
-
Kong, C.1
Lin, D.2
Bansal, M.3
Urtasun, R.4
Fidler, S.5
-
29
-
-
84906923690
-
Jointly learning to parse and perceive: Connecting natural language to the physical world
-
Krishnamurthy, J., Kollar, T.: Jointly learning to parse and perceive: connecting natural language to the physical world. Trans. Assoc. Comput. Linguist. (TACL) 1, 193–206 (2013)
-
(2013)
Trans. Assoc. Comput. Linguist. (TACL)
, vol.1
, pp. 193-206
-
-
Krishnamurthy, J.1
Kollar, T.2
-
30
-
-
84973884868
-
Unsupervised object discovery and tracking in video collections
-
Kwak, S., Cho, M., Laptev, I., Ponce, J., Schmid, C.: Unsupervised object discovery and tracking in video collections. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
-
-
Kwak, S.1
Cho, M.2
Laptev, I.3
Ponce, J.4
Schmid, C.5
-
31
-
-
84911442106
-
Visual semantic search: Retrieving videos via complex textual queries
-
IEEE
-
Lin, D., Fidler, S., Kong, C., Urtasun, R.: Visual semantic search: retrieving videos via complex textual queries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2657–2664. IEEE (2014)
-
(2014)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 2657-2664
-
-
Lin, D.1
Fidler, S.2
Kong, C.3
Urtasun, R.4
-
32
-
-
84906493406
-
Microsoft COCO: Common objects in context
-
Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.), Springer, Switzerland
-
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Switzerland (2014)
-
(2014)
ECCV 2014, Part V. LNCS
, vol.8693
, pp. 740-755
-
-
Lin, T.-Y.1
-
33
-
-
84986260074
-
Generation and comprehension of unambiguous object descriptions
-
Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A., Murphy, K.: Generation and comprehension of unambiguous object descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Mao, J.1
Huang, J.2
Toshev, A.3
Camburu, O.4
Yuille, A.5
Murphy, K.6
-
34
-
-
84867118595
-
A joint model of language and perception for grounded attribute learning
-
Matuszek, C., Fitzgerald, N., Zettlemoyer, L., Bo, L., Fox, D.: A joint model of language and perception for grounded attribute learning. In: Proceedings of the International Conference on Machine Learning (ICML) (2012)
-
(2012)
Proceedings of the International Conference on Machine Learning (ICML)
-
-
Matuszek, C.1
Fitzgerald, N.2
Zettlemoyer, L.3
Bo, L.4
Fox, D.5
-
35
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer image-tosentence models
-
Plummer, B., Wang, L., Cervantes, C., Caicedo, J., Hockenmaier, J., Lazebnik, S.: Flickr30k entities: collecting region-to-phrase correspondences for richer image-tosentence models. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
-
-
Plummer, B.1
Wang, L.2
Cervantes, C.3
Caicedo, J.4
Hockenmaier, J.5
Lazebnik, S.6
-
38
-
-
84990066399
-
-
arXiv:1403.1024
-
Song, H.O., Girshick, R., Jegelka, S., Mairal, J., Harchaoui, Z., Darrell, T.: On learning to localize objects with minimal supervision. arXiv:1403.1024 (2014)
-
(2014)
On Learning to Localize Objects with Minimal Supervision
-
-
Song, H.O.1
Girshick, R.2
Jegelka, S.3
Mairal, J.4
Harchaoui, Z.5
Darrell, T.6
-
39
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 3104–3112 (2014)
-
(2014)
Advances in Neural Information Processing Systems (NIPS)
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
40
-
-
84911407409
-
Co-localization in real-world images
-
IEEE
-
Tang, K., Joulin, A., Li, L.J., Fei-Fei, L.: Co-localization in real-world images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2014)
-
(2014)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Tang, K.1
Joulin, A.2
Li, L.J.3
Fei-Fei, L.4
-
41
-
-
84959255361
-
Book2movie: Aligning video scenes with book chapters
-
Tapaswi, M., Bäuml, M., Stiefelhagen, R.: Book2movie: aligning video scenes with book chapters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1827–1835 (2015)
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 1827-1835
-
-
Tapaswi, M.1
Bäuml, M.2
Stiefelhagen, R.3
-
42
-
-
84881160857
-
Selective search for object recognition
-
Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. (IJCV) 104(2), 154–171 (2013)
-
(2013)
Int. J. Comput. Vis. (IJCV)
, vol.104
, Issue.2
, pp. 154-171
-
-
Uijlings, J.R.1
Van De Sande, K.E.2
Gevers, T.3
Smeulders, A.W.4
-
43
-
-
56449089103
-
Extracting and composing robust features with denoising autoencoders
-
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the International Conference on Machine Learning (ICML) (2008)
-
(2008)
Proceedings of the International Conference on Machine Learning (ICML)
-
-
Vincent, P.1
Larochelle, H.2
Bengio, Y.3
Manzagol, P.A.4
-
44
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
46
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)
-
(2015)
Proceedings of the International Conference on Machine Learning (ICML)
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
47
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
48
-
-
84986240394
-
-
arXiv:1507. 05738
-
Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., Fei-Fei, L.: Every moment counts: dense detailed labeling of actions in complex videos. arXiv:1507. 05738 (2015)
-
(2015)
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
-
-
Yeung, S.1
Russakovsky, O.2
Jin, N.3
Riluka, M.4
Mori, G.5
Fei-Fei, L.6
-
49
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
-
(2014)
Trans. Assoc. Comput. Linguist
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
52
-
-
84973911532
-
Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
-
Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., Fidler, S.: Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
-
-
Zhu, Y.1
Kiros, R.2
Zemel, R.3
Salakhutdinov, R.4
Urtasun, R.5
Torralba, A.6
Fidler, S.7
-
53
-
-
84906489617
-
Edge boxes: Locating object proposals from edges
-
Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.), Springer, Switzerland
-
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 391–405. Springer, Switzerland (2014)
-
(2014)
ECCV 2014, Part V. LNCS
, vol.8693
, pp. 391-405
-
-
Zitnick, C.L.1
Dollár, P.2
|