SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 49-58

Learning deep representations of fine-grained visual descriptions

(4) Reed, Scott a Akata, Zeynep b Lee, Honglak a Schiele, Bernt b

a UNIVERSITY OF MICHIGAN (United States)

b MAX PLANCK INSTITUTE FOR INFORMATICS (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

CLASSIFICATION (OF INFORMATION); COMPUTER VISION; IMAGE RETRIEVAL; INFORMATION RETRIEVAL; NATURAL LANGUAGE PROCESSING SYSTEMS; PATTERN RECOGNITION; TEXT PROCESSING;

CATEGORY SPECIFICS; DISTINGUISHING CATEGORY; EMBEDDING PROBLEMS; NATURAL LANGUAGE INTERFACES; SHOT CLASSIFICATION; STATE-OF-THE-ART METHODS; TEXT-BASED IMAGE RETRIEVALS; VISUAL RECOGNITION;

VISUAL LANGUAGES;

EID: 84986250442 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.13 Document Type: Conference Paper

Times cited : (947)

References (53)

1
- 84986259594
- Labelembedding for image classification
- 2, 3, 5, 7
- Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. Labelembedding for image classification. IEEE TPAMI, 2015.
- (2015) IEEE TPAMI
- Akata, Z.¹ Perronnin, F.² Harchaoui, Z.³ Schmid, C.⁴

2
- 84959243017
- Evaluation of output embeddings for fine-grained image classification
- 1, 2, 3, 6, 7
- Z. Akata, S. Reed, D. Walter, H. Lee, and B. Schiele. Evaluation of Output Embeddings for Fine-Grained Image Classification. In CVPR, 2015.
- (2015) CVPR
- Akata, Z.¹ Reed, S.² Walter, D.³ Lee, H.⁴ Schiele, B.⁵

3
- 84973882857
- Predicting deep zero-shot convolutional neural networks using textual descriptions
- 1, 2, 5, 8
- J. Ba, K. Swersky, S. Fidler, and R. Salakhutdinov. Predicting deep zero-shot convolutional neural networks using textual descriptions. In ICCV, 2015.
- (2015) ICCV
- Ba, J.¹ Swersky, K.² Fidler, S.³ Salakhutdinov, R.⁴

4
- 85162050606
- Label embedding trees for large multi-class tasks
- 2
- S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In NIPS, 2010.
- (2010) NIPS
- Bengio, S.¹ Weston, J.² Grangier, D.³

5
- 84889607930
- Zero-shot video retrieval using content and concepts
- 2
- J. Dalton, J. Allan, and P. Mirajkar. Zero-shot video retrieval using content and concepts. In CIKM, 2013.
- (2013) CIKM
- Dalton, J.¹ Allan, J.² Mirajkar, P.³

6
- 85198028989
- ImageNet: A large-scale hierarchical image database
- 2
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- (2009) CVPR
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

7
- 84887325349
- Fine-grained crowdsourcing for fine-grained recognition
- 1, 2
- J. Deng, J. Krause, and L. Fei-Fei. Fine-grained crowdsourcing for fine-grained recognition. In CVPR, 2013.
- (2013) CVPR
- Deng, J.¹ Krause, J.² Fei-Fei, L.³

8
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- 1, 2
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

9
- 84919881041
- Decaf: A deep convolutional activation feature for generic visual recognition
- 2
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.
- (2014) ICML
- Donahue, J.¹ Jia, Y.² Vinyals, O.³ Hoffman, J.⁴ Zhang, N.⁵ Tzeng, E.⁶ Darrell, T.⁷

10
- 84866719272
- Discovering localized attributes for fine-grained recognition
- 1, 2
- K. Duan, D. Parikh, D. J. Crandall, and K. Grauman. Discovering localized attributes for fine-grained recognition. In CVPR, 2012.
- (2012) CVPR
- Duan, K.¹ Parikh, D.² Crandall, D.J.³ Grauman, K.⁴

11
- 84898803425
- Write a classifier: Zero-shot learning using purely textual descriptions
- 2, 8
- M. Elhoseiny, B. Saleh, and A. Elgammal. Write a classifier: Zero-shot learning using purely textual descriptions. In ICCV, 2013.
- (2013) ICCV
- Elhoseiny, M.¹ Saleh, B.² Elgammal, A.³

12
- 84898958665
- Devise: A deep visual-semantic embedding model
- 1, 2
- A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, and T. Mikolov. Devise: A deep visual-semantic embedding model. In NIPS, 2013.
- (2013) NIPS
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

13
- 84906482165
- Transductive multi-view embedding for zero-shot recognition and annotation
- 1
- Y. Fu, T. M. Hospedales, T. Xiang, Z. Fu, and S. Gong. Transductive multi-view embedding for zero-shot recognition and annotation. In ECCV, 2014.
- (2014) ECCV
- Fu, Y.¹ Hospedales, T.M.² Xiang, T.³ Fu, Z.⁴ Gong, S.⁵

14
- 84941001216
- Transductive multi-view zero-shot learning
- 7
- Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. Transductive multi-view zero-shot learning. IEEE TPAMI, 37 (11): 2332-2345, 2015.
- (2015) IEEE TPAMI , vol.37 , Issue.11 , pp. 2332-2345
- Fu, Y.¹ Hospedales, T.M.² Xiang, T.³ Gong, S.⁴

15
- 84899708619
- Composite concept discovery for zero-shot video event detection
- 2
- A. Habibian, T. Mensink, and C. G. Snoek. Composite concept discovery for zero-shot video event detection. In Proceedings of International Conference on Multimedia Retrieval, 2014.
- (2014) Proceedings of International Conference on Multimedia Retrieval
- Habibian, A.¹ Mensink, T.² Snoek, C.G.³

16
- 0000679216
- Distributional structure
- 1
- Z. Harris. Distributional structure. Word, 10 (23), 1954.
- (1954) Word , vol.10 , Issue.23
- Harris, Z.¹

17
- 0031573117
- Long short-term memory
- Nov.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9 (8): 1735-1780, Nov. 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

18
- 84959245593
- Learning hypergraph-regularized attribute predictors
- 7
- S. Huang, M. Elhoseiny, A. Elgammal, and D. Yang. Learning hypergraph-regularized attribute predictors. In CVPR, 2015.
- (2015) CVPR
- Huang, S.¹ Elhoseiny, M.² Elgammal, A.³ Yang, D.⁴

19
- 84969584486
- Batch normalization: Accelerating deep network training by reducing internal covariate shift
- 5
- S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
- (2015) ICML
- Ioffe, S.¹ Szegedy, C.²

20
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- 1, 2
- A. Karpathy and F. Li. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
- (2015) CVPR
- Karpathy, A.¹ Li, F.²

21
- 84959189488
- Ranking and retrieval of image sequences from multiple paragraph queries
- 2
- G. Kim, S. Moon, and L. Sigal. Ranking and retrieval of image sequences from multiple paragraph queries. In CVPR, 2015.
- (2015) CVPR
- Kim, G.¹ Moon, S.² Sigal, L.³

22
- 84876231242
- ImageNet classification with deep convolutional neural networks
- 2
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

23
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- 1
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. choi, A. Berg, and T. Berg. Baby talk: understanding and generating simple image descriptions. In CVPR, 2011.
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.⁶ Berg, T.⁷

24
- 84894522762
- Attributebased classification for zero-shot visual object categorization
- 1, 2
- C. Lampert, H. Nickisch, and S. Harmeling. Attributebased classification for zero-shot visual object categorization. IEEE TPAMI, 36 (3): 453-465, 2014.
- (2014) IEEE TPAMI , vol.36 , Issue.3 , pp. 453-465
- Lampert, C.¹ Nickisch, H.² Harmeling, S.³

25
- 85009931853
- Microsoft COCO: Common objects in context
- 1
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV. 2014.
- (2014) ECCV.
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

26
- 85083950512
- Deep captioning with multimodal recurrent neural networks (MRNN)
- 2
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (MRNN). ICLR, 2015.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.⁵

27
- 84986276277
- October, [Online; posted 27-October-2015].
- C. Metz. Facebooks ai can caption photos for the blind on its own, October 2015. [Online; posted 27-October-2015].
- (2015) Facebooks Ai Can Caption Photos for the Blind on Its Own
- Metz, C.¹

28
- 84898956512
- Distributed representations of words and phrases and their compositionality
- 1, 4
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
- (2013) NIPS
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

29
- 84976702763
- Wordnet: A lexical database for English
- 1
- G. A. Miller. Wordnet: A lexical database for English. CACM, 38 (11): 39-41, 1995.
- (1995) CACM , vol.38 , Issue.11 , pp. 39-41
- Miller, G.A.¹

30
- 84959228762
- Beyond short snippets: Deep networks for video classification
- 2
- J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. In CVPR, 2015.
- (2015) CVPR
- Ng, J.Y.-H.¹ Hausknecht, M.² Vijayanarasimhan, S.³ Vinyals, O.⁴ Monga, R.⁵ Toderici, G.⁶

31
- 80053437179
- Multimodal deep learning
- 2
- J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodal deep learning. In ICML, 2011.
- (2011) ICML
- Ngiam, J.¹ Khosla, A.² Kim, M.³ Nam, J.⁴ Lee, H.⁵ Ng, A.Y.⁶

32
- 65249121810
- Automated flower classification over a large number of classes
- 2
- M.-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In ICCVGIP, 2008.
- (2008) ICCVGIP
- Nilsback, M.-E.¹ Zisserman, A.²

33
- 84898979068
- arXiv: 1312. 5650, 1, 2
- M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. Corrado, and J. Dean. Zero-shot learning by convex combination of semantic embeddings. ArXiv: 1312. 5650, 2013.
- (2013) Zero-shot Learning by Convex Combination of Semantic Embeddings
- Norouzi, M.¹ Mikolov, T.² Bengio, S.³ Singer, Y.⁴ Shlens, J.⁵ Frome, A.⁶ Corrado, G.⁷ Dean, J.⁸

34
- 84908539410
- Learning and transferring mid-level image representations using convolutional neural networks
- M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR.
- CVPR , vol.2
- Oquab, M.¹ Bottou, L.² Laptev, I.³ Sivic, J.⁴

35
- 85162522202
- Im2Text: Describing images using 1 million captioned photographs
- 1
- V. Ordonez, G. Kulkarni, and T. Berg. Im2Text: Describing images using 1 million captioned photographs. In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.³

36
- 80053456996
- Zero-shot learning with semantic output codes
- 1, 2
- M. Palatucci, D. Pomerleau, G. Hinton, and T. Mitchell. Zero-shot learning with semantic output codes. In NIPS, 2009.
- (2009) NIPS
- Palatucci, M.¹ Pomerleau, D.² Hinton, G.³ Mitchell, T.⁴

37
- 84961289992
- Glove: Global vectors for word representation
- 1
- J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014.
- (2014) EMNLP
- Pennington, J.¹ Socher, R.² Manning, C.D.³

38
- 80052892795
- Evaluating knowledge transfer and zero-shot learning in a large-scale setting
- 1, 2
- M. Rohrbach, M. Stark, and B. Schiele. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In CVPR, 2011.
- (2011) CVPR
- Rohrbach, M.¹ Stark, M.² Schiele, B.³

39
- 84947041871
- Imagenet large scale visual recognition challenge
- 1
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 115 (3): 211-252, 2015.
- (2015) IJCV , vol.115 , Issue.3 , pp. 211-252
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵

40
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- 7
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

41
- 84898938559
- Zero-shot learning through cross-modal transfer
- 1, 2
- R. Socher, M. Ganjoo, H. Sridhar, O. Bastani, C. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In NIPS, 2013.
- (2013) NIPS
- Socher, R.¹ Ganjoo, M.² Sridhar, H.³ Bastani, O.⁴ Manning, C.⁵ Ng, A.⁶

42
- 84937873395
- Improved multimodal deep learning with variation of information
- 2
- K. Sohn, W. Shang, and H. Lee. Improved multimodal deep learning with variation of information. In NIPS, 2014.
- (2014) NIPS
- Sohn, K.¹ Shang, W.² Lee, H.³

43
- 84916911784
- Multimodal learning with deep boltzmann machines
- 2
- N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. JMLR, 15: 2949-2980, 2014.
- (2014) JMLR , vol.15 , pp. 2949-2980
- Srivastava, N.¹ Salakhutdinov, R.²

44
- 84937522268
- Going deeper with convolutions
- 2, 5, 7
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

45
- 84946747440
- Show and tell: A neural image caption generator
- 1, 2
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

46
- 80052891795
- Technical Report CNS-TR-2010-001, Caltech, 1, 2
- P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, Caltech, 2010.
- (2010) Caltech-UCSD Birds 200
- Welinder, P.¹ Branson, S.² Mita, T.³ Wah, C.⁴ Schroff, F.⁵ Belongie, S.⁶ Perona, P.⁷

47
- 77955654853
- Large scale image annotation: Learning to rank with joint word-image embeddings
- 2
- J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings. ECML, 2010.
- (2010) ECML
- Weston, J.¹ Bengio, S.² Usunier, N.³

48
- 84911434661
- Zero-shot event detection using multi-modal fusion of weakly supervised concepts
- 2
- S. Wu, S. Bondugula, F. Luisier, X. Zhuang, and P. Natarajan. Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In CVPR, 2014.
- (2014) CVPR
- Wu, S.¹ Bondugula, S.² Luisier, F.³ Zhuang, X.⁴ Natarajan, P.⁵

49
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- 2
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

50
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- 1
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2: 67-78, 2014.
- (2014) TACL , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

51
- 84956617559
- Partbased R-CNNs for fine-grained category detection
- 1, 2
- N. Zhang, J. Donahue, R. Girshick, and T. Darrell. Partbased R-CNNs for fine-grained category detection. In ECCV, 2014.
- (2014) ECCV
- Zhang, N.¹ Donahue, J.² Girshick, R.³ Darrell, T.⁴

52
- 84965162393
- Character-level convolutional networks for text classification
- 2, 3
- X. Zhang, J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. In NIPS, 2015.
- (2015) NIPS
- Zhang, X.¹ Zhao, J.² LeCun, Y.³

53
- 84973861983
- Conditional random fields as recurrent neural networks
- 2
- S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr. Conditional random fields as recurrent neural networks. In ICCV, 2015.
- (2015) ICCV
- Zheng, S.¹ Jayasumana, S.² Romera-Paredes, B.³ Vineet, V.⁴ Su, Z.⁵ Du, D.⁶ Huang, C.⁷ Torr, P.H.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.