SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2015 International Conference on Computer Vision, ICCV 2015, Issue , 2015, Pages 2623-2631

Multimodal convolutional neural networks for matching image and sentence

(4) Ma, Lin a Lu, Zhengdong a Shang, Lifeng a Li, Hang a

a HUAWEI NOAH S ARK LAB (China)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; CONVOLUTION; NEURAL NETWORKS; SEMANTICS;

CNN MODELS; CONVOLUTIONAL NEURAL NETWORK; END TO END; IMAGE CONTENT; IMAGE REPRESENTATIONS; MULTI-MODAL; STATE-OF-THE-ART APPROACH;

IMAGE MATCHING;

EID: 84973864182 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2015.301 Document Type: Conference Paper

Times cited : (378)

References (37)

1
- 84867605836
- Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition
- 3
- O. Abdel Hamid, A. R. Mohamed, H. Jiang, and G. Penn. Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. ICASSP, 2012. 3
- (2012) ICASSP
- Abdel Hamid, O.¹ Mohamed, A.R.² Jiang, H.³ Penn, G.⁴

2
- 0000782329
- Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping
- 6
- R. Caruana, S. Lawrence, and C. L. Giles. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. NIPS, 2000. 6
- (2000) NIPS
- Caruana, R.¹ Lawrence, S.² Giles, C.L.³

3
- 84944115859
- arXiv 1411 5654. 2, 6, 7
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. arXiv: 1411. 5654, 2014. 2, 6, 7
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

4
- 84890527827
- Improving deep neural networks for lvcsr using rectified linear units and dropout
- 2
- G. E. Dahl, T. N. Sainath, and G. E. Hinton. Improving deep neural networks for lvcsr using rectified linear units and dropout. ICASSP, 2013. 2
- (2013) ICASSP
- Dahl, G.E.¹ Sainath, T.N.² Hinton, G.E.³

5
- 84944046597
- arXiv 1411 4389. 2, 7
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darreell. Long-term recurrent convolutional networks for visual recognition and description. arXiv: 1411. 4389, 2014. 2, 7
- (2014) Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darreell, T.⁷

6
- 84898958665
- Devise: A deep visual-semantic embedding model
- 1, 2, 6, 7, 8
- A. Frame, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. A. Ranzato, and T. Mikolov. Devise: A deep visual-semantic embedding model. NIPS, 2013. 1, 2, 6, 7, 8
- (2013) NIPS
- Frame, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Ranzato, M.A.⁶ Mikolov, T.⁷

7
- 84911400494
- Rich feature hierachies for accurate object detection and semantic segmentation
- 8
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierachies for accurate object detection and semantic segmentation. CVPR, 2014. 8
- (2014) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

8
- 84959243872
- Improving image-sentence embeddings using large weakly annotated photo collections
- 1
- Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. ECCV, 2014. 1
- (2014) ECCV
- Gong, Y.¹ Wang, L.² Hodosh, M.³ Hockenmaier, J.⁴ Lazebnik, S.⁵

9
- 34447620428
- A neural network to retrieve images from text queries
- 2
- D. Grangier and S. Bengio. A neural network to retrieve images from text queries. ICANN, 2006. 2
- (2006) ICANN
- Grangier, D.¹ Bengio, S.²

10
- 84928278589
- Spatial pyramid pooling in deep convolutional networks for visual recognition
- 1
- K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. ECCV, 2014. 1
- (2014) ECCV
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

11
- 84973911419
- arXiv 1502 01852. 1
- K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. arXiv: 1502. 01852, 2015. 1
- (2015) Delving Deep into Rectifiers: Surpassing Human-level Performance on Imagenet Classification
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

12
- 84867720412
- arXiv 1207 0580. 6
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by proventing co-adaptation of feature detecters. arXiv: 1207. 0580, 2012. 6
- (2012) Improving Neural Networks by Proventing Co-adaptation of Feature Detecters
- Hinton, G.E.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

13
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 1, 2, 6
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47: 853-899, 2013. 1, 2, 6
- (2013) Journal of Artificial Intelligence Research , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

14
- 84937936034
- Convolutional neural network architectures for matching natural language sentences
- 1, 3
- B. Hu, Z. Lu, H. Li, and Q. Chen. Convolutional neural network architectures for matching natural language sentences. NIPS, 2014. 1, 3
- (2014) NIPS
- Hu, B.¹ Lu, Z.² Li, H.³ Chen, Q.⁴

15
- 84906922163
- A convolutional neural network for modelling sentences
- 1
- N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A convolutional neural network for modelling sentences. ACL, 2014. 1
- (2014) ACL
- Kalchbrenner, N.¹ Grefenstette, E.² Blunsom, P.³

16
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- 1, 2, 6, 7, 8
- A. Karpathy, A. Joulin, and F.-F. Li. Deep fragment embeddings for bidirectional image sentence mapping. NIPS, 2014. 1, 2, 6, 7, 8
- (2014) NIPS
- Karpathy, A.¹ Joulin, A.² Li, F.-F.³

17
- 84942676733
- arXiv 1412 2306. 2, 6, 7
- A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. arXiv: 1412. 2306, 2014. 2, 6, 7
- (2014) Deep Visual-semantic Alignments for Generating Image Descriptions
- Karpathy, A.¹ Li, F.-F.²

18
- 84961376850
- Convolutional neural network for sentence classification
- 1
- Y. Kim. Convolutional neural network for sentence classification. EMNLP, 2014. 1
- (2014) EMNLP
- Kim, Y.¹

19
- 84919921461
- Multimodal neural language model
- 2
- R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language model. ICML, 2014. 2
- (2014) ICML
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

20
- 84944113729
- arXiv 1411 2539. 2, 6, 7
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. arXiv: 1411. 2539, 2014. 2, 6, 7
- (2014) Unifying Visual-semantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

21
- 84877777478
- Deep representations and codes for image auto-annotation
- 1
- R. Kiros and C. Szepesvári. Deep representations and codes for image auto-annotation. NIPS, 2012. 1
- (2012) NIPS
- Kiros, R.¹ Szepesvári, C.²

22
- 84965122839
- Convolutional networks for images, speech and time series
- 3
- Y. LeCun and Y. Bengio. Convolutional networks for images, speech and time series. The Handbook of Brain Theory and Neural Networks, 3361, 1995. 3
- (1995) The Handbook of Brain Theory and Neural Networks , pp. 3361
- LeCun, Y.¹ Bengio, Y.²

23
- 84939821073
- arXiv 1412 6632. 2, 7
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Deep captioning with multimodal recurrent neural networks (mrnn). arXiv: 1412. 6632, 2014. 2, 7
- (2014) Deep Captioning with Multimodal Recurrent Neural Networks (Mrnn)
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

24
- 84951072975
- arXiv 1410 1090. 2, 6, 7
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv: 1410. 1090, 2014. 2, 6, 7
- (2014) Explain Images with Multimodal Recurrent Neural Networks
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

25
- 85083951332
- arXiv 1301 3781. 6
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv: 1301. 3781, 2013. 6
- (2013) Efficient Estimation of Word Representations in Vector Space
- Mikolov, T.¹ Chen, K.² Corrado, G.³ Dean, J.⁴

26
- 85162522202
- Im2txt: Describing images using 1 million captioned photogrphs
- 1
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2txt: Describing images using 1 million captioned photogrphs. NIPS, 2011. 1
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

27
- 80052889458
- Recognition using visual phrases
- 1, 2
- M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. CVPR, 2011. 1, 2
- (2011) CVPR
- Sadeghi, M.A.¹ Farhadi, A.²

28
- 85083951635
- arXiv 1312 6229, 2, 5, 6, 7
- P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Intergrated recognition, localization and detection using convolutional networks. arXiv: 1312. 6229, 2014. 2, 5, 6, 7
- (2014) Overfeat: Intergrated Recognition, Localization and Detection Using Convolutional Networks
- Sermanet, P.¹ Eigen, D.² Zhang, X.³ Mathieu, M.⁴ Fergus, R.⁵ LeCun, Y.⁶

29
- 84925410541
- arXiv 1409 1556. 1, 2, 5, 6, 7
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv: 1409. 1556, 2014. 1, 2, 5, 6, 7
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

30
- 84964474107
- Grounded compositional semantics for finding and describing images with sentences
- 1, 2, 6, 7, 8
- R. Socher, Q. V. L. A. Karpathy, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics, 2: 207-218, 2014. 1, 2, 6, 7, 8
- (2014) Transactions of the Association for Computational Linguistics , vol.2 , pp. 207-218
- Socher, R.¹ Karpathy, Q.V.L.A.² Manning, C.D.³ Ng, A.Y.⁴

31
- 84946596683
- Learning representations for multimodal data with deep belief nets
- 1, 2
- N. Srivastava and R. Salakhutdinov. Learning representations for multimodal data with deep belief nets. ICML Representation Learning Workshop, 2012. 1, 2
- (2012) ICML Representation Learning Workshop
- Srivastava, N.¹ Salakhutdinov, R.²

32
- 84877724347
- Multimodal learning with deep boltzmann machines
- 1, 2
- N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. NIPS, 2012. 1, 2
- (2012) NIPS
- Srivastava, N.¹ Salakhutdinov, R.²

33
- 84964983441
- arXiv 1409 4842. 1, 7
- C. Szegedy, W. Liu, Y. Jia, P. Sermannet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv: 1409. 4842, 2014. 1, 7
- (2014) Going Deeper with Convolutions
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermannet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

34
- 84939821075
- arXiv 1411 4556. 2, 6, 7
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv: 1411. 4556, 2014. 2, 6, 7
- (2014) Show and Tell: A Neural Image Caption Generator
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

35
- 84867117593
- Wsabie: Scaling up to large vocabulary image annotation
- 2
- J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. IJCAI, 2011. 2
- (2011) IJCAI
- Weston, J.¹ Bengio, S.² Usunier, N.³

36
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- 7
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2014. 7
- (2014) Transactions of the Association for Computational Linguistics
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

37
- 84898772194
- Learning the visual interpretation of sentences
- 1, 2
- C. L. Zitnick, D. Parikh, and L. Vanderwende. Learning the visual interpretation of sentences. ICCV, 2013. 1, 2
- (2013) ICCV
- Zitnick, C.L.¹ Parikh, D.² Vanderwende, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.