SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 5187-5196

Person search with natural language description

(6) Li, Shuang a Xiao, Tong a Li, Hongsheng a Zhou, Bolei b Yue, Dayu c Wang, Xiaogang a

a CHINESE UNIVERSITY OF HONG KONG (Hong Kong)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

c SENSETIME RESEARCH (China)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; QUERY PROCESSING; RECURRENT NEURAL NETWORKS; SECURITY SYSTEMS;

ATTENTION MECHANISMS; ATTRIBUTE-BASED; IMAGE DATABASE; IMAGE-BASED; NATURAL LANGUAGES; STATE-OF-THE-ART PERFORMANCE; TEXTUAL DESCRIPTION; VIDEO SURVEILLANCE;

PATTERN RECOGNITION;

EID: 85041893972 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.551 Document Type: Conference Paper

Times cited : (353)

References (45)

1
- 84973890960
- Vqa: Visual question answering
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, pages 2425-2433, 2015.
- (2015) ICCV , pp. 2425-2433
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Lawrence Zitnick, C.⁶ Parikh, D.⁷

2
- 84952349295
- arXiv preprint arXiv
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollár, P.⁶ Zitnick, C.L.⁷

3
- 84944115859
- arXiv preprint arXiv
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654, 2014.
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

4
- 84913526249
- Pedestrian attribute recognition at far distance
- Y. Deng, P. Luo, C. C. Loy, and X. Tang. Pedestrian attribute recognition at far distance. In ACM MM, pages 789-792, 2014.
- (2014) ACM MM , pp. 789-792
- Deng, Y.¹ Luo, P.² Loy, C.C.³ Tang, X.⁴

5
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015.
- (2015) CVPR , pp. 1473-1482
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰

6
- 84898958665
- Devise: A deep visual-semantic embedding model
- A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In NIPS, pages 2121-2129, 2013.
- (2013) NIPS , pp. 2121-2129
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

7
- 84990060711
- arXiv preprint arXiv
- A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847, 2016.
- (2016) Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , pp. 01847
- Fukui, A.¹ Park, D.H.² Yang, D.³ Rohrbach, A.⁴ Darrell, T.⁵ Rohrbach, M.⁶

8
- 84965148420
- Are you talking to a machine? Dataset and methods for multilingual image question
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? dataset and methods for multilingual image question. In NIPS, pages 2296-2304, 2015.
- (2015) NIPS , pp. 2296-2304
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

9
- 79951590356
- Evaluating appearance models for recognition, reacquisition, and tracking
- D. Gray, S. Brennan, and H. Tao. Evaluating appearance models for recognition, reacquisition, and tracking. In Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), number 5, 2007.
- (2007) Proc IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS) , Issue.5
- Gray, D.¹ Brennan, S.² Tao, H.³

10
- 84986274465
- Deep residual learning for image recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 770-778
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

11
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

12
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853-899, 2013.
- (2013) Journal of Artificial Intelligence Research , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

13
- 85030448950
- arXiv preprint arXiv
- R. Hu, M. Rohrbach, and T. Darrell. Segmentation from natural language expressions. arXiv preprint arXiv:1603.06180, 2016.
- (2016) Segmentation from Natural Language Expressions
- Hu, R.¹ Rohrbach, M.² Darrell, T.³

14
- 84986305787
- Natural language object retrieval
- R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell. Natural language object retrieval. CVPR, 2016.
- (2016) CVPR
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

15
- 84986278097
- arXiv preprint arXiv
- J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. arXiv preprint arXiv:1511.07571, 2015.
- (2015) Densecap: Fully Convolutional Localization Networks for Dense Captioning
- Johnson, J.¹ Karpathy, A.² Fei-Fei, L.³

16
- 85041925966
- Object detection in videos with tubelet proposal networks
- K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang. Object detection in videos with tubelet proposal networks. In CVPR, 2017.
- (2017) CVPR
- Kang, K.¹ Li, H.² Xiao, T.³ Ouyang, W.⁴ Yan, J.⁵ Liu, X.⁶ Wang, X.⁷

17
- 84986301354
- arXiv preprint arXiv
- K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z.Wang, R.Wang, X.Wang, et al. T-cnn: Tubelets with convolutional neural networks for object detection from videos. arXiv preprint arXiv:1604.02532, 2016.
- (2016) T-cnn: Tubelets with Convolutional Neural Networks for Object Detection from Videos
- Kang, K.¹ Li, H.² Yan, J.³ Zeng, X.⁴ Yang, B.⁵ Xiao, T.⁶ Zhang, C.⁷ Wang, Z.⁸ Wang, R.⁹ Wang, X.¹⁰

18
- 84986331475
- Object detection from video tubelets with convolutional neural networks
- K. Kang, W. Ouyang, H. Li, and X. Wang. Object detection from video tubelets with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 817-825, 2016.
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 817-825
- Kang, K.¹ Ouyang, W.² Li, H.³ Wang, X.⁴

19
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, pages 3128-3137, 2015.
- (2015) CVPR , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

20
- 84978730111
- arXiv preprint arXiv
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332, 2016.
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

21
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

22
- 84875891971
- Human reidentification with transferred metric learning
- W. Li, R. Zhao, and X. Wang. Human reidentification with transferred metric learning. In ACCV, pages 31-44, 2012.
- (2012) ACCV , pp. 31-44
- Li, W.¹ Zhao, R.² Wang, X.³

23
- 84911383794
- Deepreid: Deep filter pairing neural network for person re-identification
- W. Li, R. Zhao, T. Xiao, and X. Wang. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, pages 152-159, 2014.
- (2014) CVPR , pp. 152-159
- Li, W.¹ Zhao, R.² Xiao, T.³ Wang, X.⁴

24
- 84955305813
- Person re-identification by local maximal occurrence representation and metric learning
- S. Liao, Y. Hu, X. Zhu, and S. Z. Li. Person re-identification by local maximal occurrence representation and metric learning. In CVPR, pages 2197-2206, 2015.
- (2015) CVPR , pp. 2197-2206
- Liao, S.¹ Hu, Y.² Zhu, X.³ Li, S.Z.⁴

25
- 84906493406
- Microsoft coco: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, pages 740-755, 2014.
- (2014) ECCV , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

26
- 84947599757
- Multi-task deep visual-semantic embedding for video thumbnail selection
- W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo. Multi-task deep visual-semantic embedding for video thumbnail selection. In CVPR, pages 3707-3715, 2015.
- (2015) CVPR , pp. 3707-3715
- Liu, W.¹ Mei, T.² Zhang, Y.³ Che, C.⁴ Luo, J.⁵

27
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, pages 1-9, 2015.
- (2015) ICCV , pp. 1-9
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

28
- 84951072975
- arXiv preprint arXiv
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632, 2014.
- (2014) Deep Captioning with Multimodal Recurrent Neural Networks (M-rnn)
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

29
- 65249121810
- Automated flower classification over a large number of classes
- M.-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In Proceedings of Indian Conference on Computer Vision, Graphics & Image Processing, pages 722-729, 2008.
- (2008) Proceedings of Indian Conference on Computer Vision, Graphics & Image Processing , pp. 722-729
- Nilsback, M.-E.¹ Zisserman, A.²

30
- 84990067789
- arXiv preprint arXiv
- H. Noh, P. H. Seo, and B. Han. Image question answering using convolutional neural network with dynamic parameter prediction. arXiv preprint arXiv:1511.05756, 2015.
- (2015) Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction
- Noh, H.¹ Seo, P.H.² Han, B.³

31
- 85028056906
- arXiv preprint arXiv
- S. Reed, Z. Akata, B. Schiele, and H. Lee. Learning deep representations of fine-grained visual descriptions. arXiv preprint arXiv:1605.05395, 2016.
- (2016) Learning Deep Representations of Fine-grained Visual Descriptions
- Reed, S.¹ Akata, Z.² Schiele, B.³ Lee, H.⁴

32
- 84965170394
- Exploring models and data for image question answering
- M. Ren, R. Kiros, and R. Zemel. Exploring models and data for image question answering. In NIPS, pages 2953-2961, 2015.
- (2015) NIPS , pp. 2953-2961
- Ren, M.¹ Kiros, R.² Zemel, R.³

33
- 84962816362
- Image question answering: A visual semantic embedding model and a new dataset
- M. Ren, R. Kiros, and R. Zemel. Image question answering: A visual semantic embedding model and a new dataset. CoRR, abs/1505.02074, 7, 2015.
- (2015) CoRR, abs/1505 , vol.2074 , Issue.7
- Ren, M.¹ Kiros, R.² Zemel, R.³

34
- 85031713628
- arXiv preprint arXiv
- K. Saito, A. Shin, Y. Ushiku, and T. Harada. Dualnet: Domain-invariant network for visual question answering. arXiv preprint arXiv:1606.06108, 2016.
- (2016) Dualnet: Domain-invariant Network for Visual Question Answering
- Saito, K.¹ Shin, A.² Ushiku, Y.³ Harada, T.⁴

35
- 85018367088
- arXiv preprint arXiv
- C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian. Deep attributes driven multi-camera person re-identification. arXiv preprint arXiv:1605.03259, 2016.
- (2016) Deep Attributes Driven Multi-camera Person Re-identification
- Su, C.¹ Zhang, S.² Xing, J.³ Gao, W.⁴ Tian, Q.⁵

36
- 77951180261
- Attribute-based people search in surveillance environments
- D. A. Vaquero, R. S. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. Attribute-based people search in surveillance environments. In WACV, pages 1-8, 2009.
- (2009) WACV , pp. 1-8
- Vaquero, D.A.¹ Feris, R.S.² Tran, D.³ Brown, L.⁴ Hampapur, A.⁵ Turk, M.⁶

37
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015.
- (2015) CVPR , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

38
- 85044321254
- P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-ucsd birds 200. 2010.
- (2010) Caltech-ucsd Birds , vol.200
- Welinder, P.¹ Branson, S.² Mita, T.³ Wah, C.⁴ Schroff, F.⁵ Belongie, S.⁶ Perona, P.⁷

39
- 85041926497
- arXiv preprint arXiv
- T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang. Endto-end deep learning for person search. arXiv preprint arXiv:1604.01850, 2016.
- (2016) Endto-end Deep Learning for Person Search , pp. 01850
- Xiao, T.¹ Li, S.² Wang, B.³ Lin, L.⁴ Wang, X.⁵

40
- 84939821074
- arXiv preprint arXiv
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.S.⁷ Bengio, Y.⁸

41
- 84998809208
- arXiv preprint arXiv
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. arXiv preprint arXiv:1511.02274, 2015.
- (2015) Stacked Attention Networks for Image Question Answering
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

42
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67-78, 2014.
- (2014) Transactions of the Association for Computational Linguistics , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

43
- 84952062153
- arXiv preprint arXiv
- L. Zheng, L. Shen, L. Tian, S. Wang, J. Bu, and Q. Tian. Person re-identification meets image search. arXiv preprint arXiv:1502.02171, 2015.
- (2015) Person Re-identification Meets Image Search
- Zheng, L.¹ Shen, L.² Tian, L.³ Wang, S.⁴ Bu, J.⁵ Tian, Q.⁶

44
- 80052906103
- Person re-identification by probabilistic relative distance comparison
- W.-S. Zheng, S. Gong, and T. Xiang. Person re-identification by probabilistic relative distance comparison. In CVPR, pages 649-656, 2011.
- (2011) CVPR , pp. 649-656
- Zheng, W.-S.¹ Gong, S.² Xiang, T.³

45
- 84986301525
- arXiv preprint arXiv
- B. Zhou, Y. Tian, S. Sukhbaatar, A. Szlam, and R. Fergus. Simple baseline for visual question answering. arXiv preprint arXiv:1512.02167, 2015.
- (2015) Simple Baseline for Visual Question Answering
- Zhou, B.¹ Tian, Y.² Sukhbaatar, S.³ Szlam, A.⁴ Fergus, R.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.