-
2
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
Association for Computational Linguistics
-
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, pages 190-200. Association for Computational Linguistics, 2011.
-
(2011)
ACL
, pp. 190-200
-
-
Chen, D.L.1
Dolan, W.B.2
-
4
-
-
84946763507
-
Describing multimedia content using attention-based encoder-decoder networks
-
K. Cho, A. Courville, and Y. Bengio. Describing multimedia content using attention-based encoder-decoder networks. Multimedia, IEEE Transactions on, 17(11):1875-1886, 2015.
-
(2015)
Multimedia IEEE Transactions on
, vol.17
, Issue.11
, pp. 1875-1886
-
-
Cho, K.1
Courville, A.2
Bengio, Y.3
-
5
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, and K. Saenko. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, pages 2625-2634, 2015.
-
(2015)
CVPR
, pp. 2625-2634
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Darrell, T.6
Saenko, K.7
-
6
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015.
-
(2015)
CVPR
, pp. 1473-1482
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
7
-
-
84959233699
-
Optimal graph learning with partial tags and multiple features for image and video annotation
-
L. Gao, J. Song, F. Nie, Y. Yan, N. Sebe, and H. T. Shen. Optimal graph learning with partial tags and multiple features for image and video annotation. In CVPR, pages 4371-4379, 2015.
-
(2015)
CVPR
, pp. 4371-4379
-
-
Gao, L.1
Song, J.2
Nie, F.3
Yan, Y.4
Sebe, N.5
Shen, H.T.6
-
8
-
-
84994636856
-
Graph-without-cut: An ideal graph learning for image segmentation
-
L. Gao, J. Song, F. Nie, F. Zou, N. Sebe, and H. T. Shen. Graph-without-cut: An ideal graph learning for image segmentation. In AAAI, pages 1188-1194, 2016.
-
(2016)
AAAI
, pp. 1188-1194
-
-
Gao, L.1
Song, J.2
Nie, F.3
Zou, F.4
Sebe, N.5
Shen, H.T.6
-
10
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
A. Karpathy, A. Joulin, and F. F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, pages 1889-1897, 2014.
-
(2014)
NIPS
, pp. 1889-1897
-
-
Karpathy, A.1
Joulin, A.2
Li, F.F.F.3
-
11
-
-
84962850062
-
Summarization-based video caption via deep neural networks
-
ACM
-
G. Li, S. Ma, and Y. Han. Summarization-based video caption via deep neural networks. In ACM Multimedia, pages 1191-1194. ACM, 2015.
-
(2015)
ACM Multimedia
, pp. 1191-1194
-
-
Li, G.1
Ma, S.2
Han, Y.3
-
12
-
-
84951072975
-
-
arXiv preprint arXiv: 1410.1090
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.
-
(2014)
Explain Images with Multimodal Recurrent Neural Networks
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
13
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
Association for Computational Linguistics
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311-318. Association for Computational Linguistics, 2002.
-
(2002)
ACL
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
14
-
-
84888343222
-
Effective multiple feature hashing for large-scale near-duplicate video retrieval
-
J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo. Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimedia, 15(8):1997-2008, 2013.
-
(2013)
IEEE Trans. Multimedia
, vol.15
, Issue.8
, pp. 1997-2008
-
-
Song, J.1
Yang, Y.2
Huang, Z.3
Shen, H.T.4
Luo, J.5
-
15
-
-
84880548516
-
Inter-media hashing for large-scale retrieval from heterogeneous data sources
-
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD, pages 785-796, 2013.
-
(2013)
SIGMOD
, pp. 785-796
-
-
Song, J.1
Yang, Y.2
Yang, Y.3
Huang, Z.4
Shen, H.T.5
-
17
-
-
84937522268
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, pages 1-9, 2015.
-
(2015)
CVPR
, pp. 1-9
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
18
-
-
84965114137
-
-
arXiv preprint arXiv: 1412.0767
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. arXiv preprint arXiv:1412.0767, 2014.
-
(2014)
Learning Spatiotemporal Features with 3d Convolutional Networks
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
19
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, pages 4566-4575, 2015.
-
(2015)
CVPR
, pp. 4566-4575
-
-
Vedantam, R.1
Lawrence Zitnick, C.2
Parikh, D.3
-
20
-
-
84973882730
-
Sequence to sequence-video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, pages 4534-4542, 2015.
-
(2015)
ICCV
, pp. 4534-4542
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
21
-
-
84944069490
-
-
arXiv preprint arXiv: 1412.4729
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729, 2014.
-
(2014)
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
22
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015.
-
(2015)
CVPR
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
23
-
-
84986285188
-
-
arXiv preprint arXiv: 1506.01144
-
Q. Wu, C. Shen, A. V. D. Hengel, L. Liu, and A. Dick. Image captioning with an intermediate attributes layer. arXiv preprint arXiv:1506.01144, 2015.
-
(2015)
Image Captioning with An Intermediate Attributes Layer
-
-
Wu, Q.1
Shen, C.2
Hengel, A.V.D.3
Liu, L.4
Dick, A.5
-
24
-
-
84939821074
-
-
arXiv preprint arXiv: 1502.03044
-
K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
25
-
-
84940762015
-
Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
-
Citeseer
-
R. Xu, C. Xiong, W. Chen, and J. J. Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI, pages 2346-2352. Citeseer, 2015.
-
(2015)
AAAI
, pp. 2346-2352
-
-
Xu, R.1
Xiong, C.2
Chen, W.3
Corso, J.J.4
-
26
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, pages 4507-4515, 2015.
-
(2015)
ICCV
, pp. 4507-4515
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
|