-
3
-
-
0041876117
-
Matching words and pictures
-
K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003.
-
(2003)
JMLR
-
-
Barnard, K.1
Duygulu, P.2
Forsyth, D.3
De Freitas, N.4
Blei, D.M.5
Jordan, M.I.6
-
4
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
-
(2011)
ACL
-
-
Chen, D.L.1
Dolan, W.B.2
-
5
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
6
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
7
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
8
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
-
S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, 2013.
-
(2013)
ICCV
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
10
-
-
84856653718
-
Learning cross-modality similarity for multinomial data
-
Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, 2011.
-
(2011)
ICCV
-
-
Jia, Y.1
Salzmann, M.2
Darrell, T.3
-
11
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
13
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
-
(2015)
TACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
14
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
15
-
-
84887601544
-
Babytalk: Understanding and generating simple image descriptions
-
G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Trans. on PAMI, 2013.
-
(2013)
IEEE Trans. on PAMI
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
16
-
-
84951072975
-
Explain images with multimodal recurrent neural networks
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. In NIPS Workshop on Deep Learning, 2014.
-
(2014)
NIPS Workshop on Deep Learning
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
18
-
-
84904538400
-
Clickthrough-based cross-view learning for image search
-
Y. Pan, T. Yao, T. Mei, H. Li, C.-W. Ngo, and Y. Rui. Clickthrough-based cross-view learning for image search. In SIGIR, 2014.
-
(2014)
SIGIR
-
-
Pan, Y.1
Yao, T.2
Mei, T.3
Li, H.4
Ngo, C.-W.5
Rui, Y.6
-
19
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002.
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
21
-
-
84898775239
-
Translating video content to natural language descriptions
-
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
-
(2013)
ICCV
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
22
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
-
(2015)
IJCV
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
23
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
24
-
-
77955998009
-
Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
-
R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.
-
(2010)
CVPR
-
-
Socher, R.1
Fei-Fei, L.2
-
25
-
-
84937522268
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
26
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, 2014.
-
(2014)
COLING
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.5
-
28
-
-
84973865953
-
Learning spatiotemporal features with 3d convolutional networks
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
-
(2015)
ICCV
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
29
-
-
84973882730
-
Sequence to sequence-video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, 2015.
-
(2015)
ICCV
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
30
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL HLT, 2015.
-
(2015)
NAACL HLT
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
32
-
-
84962921420
-
Modeling spatial-temporal clues in a hybrid deep learning framework for video classification
-
Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. MM, 2015.
-
(2015)
MM
-
-
Wu, Z.1
Wang, X.2
Jiang, Y.-G.3
Ye, H.4
Xue, X.5
-
33
-
-
84952349307
-
Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
-
R. Xu, C. Xiong, W. Chen, and J. J. Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI, 2015.
-
(2015)
AAAI
-
-
Xu, R.1
Xiong, C.2
Chen, W.3
Corso, J.J.4
-
34
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015.
-
(2015)
ICCV
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
35
-
-
84973907734
-
Learning query and image similarities with ranking canonical correlation analysis
-
T. Yao, T. Mei, and C.-W. Ngo. Learning query and image similarities with ranking canonical correlation analysis. In ICCV, 2015.
-
(2015)
ICCV
-
-
Yao, T.1
Mei, T.2
Ngo, C.-W.3
|