-
1
-
-
84986274522
-
Deep compositional captioning: Describing novel object categories without paired training data
-
L. Anne Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR, 2016.
-
(2016)
CVPR
-
-
Anne Hendricks, L.1
Venugopalan, S.2
Rohrbach, M.3
Mooney, R.4
Saenko, K.5
Darrell, T.6
-
2
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
-
(2015)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
3
-
-
85083954507
-
Delving deeper into convolutional networks for learning video representations
-
N. Ballas, L. Yao, C. Pal, and A. Courville. Delving deeper into convolutional networks for learning video representations. In ICLR, 2016.
-
(2016)
ICLR
-
-
Ballas, N.1
Yao, L.2
Pal, C.3
Courville, A.4
-
4
-
-
85116156579
-
Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
-
S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL workshop, 2005.
-
(2005)
ACL Workshop
-
-
Banerjee, S.1
Lavie, A.2
-
5
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
-
(2011)
ACL
-
-
Chen, D.L.1
Dolan, W.B.2
-
6
-
-
84952349295
-
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv:1504.00325, 2015.
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollár, P.6
Zitnick, C.L.7
-
7
-
-
84957029470
-
Mind's eye: A recurrent visual representation for image caption generation
-
X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Chen, X.1
Lawrence Zitnick, C.2
-
8
-
-
84961291190
-
Learning phrase representations using rnn encoder-decoder for statistical machine translation
-
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In EMNLP, 2014.
-
(2014)
EMNLP
-
-
Cho, K.1
Van Merriënboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
9
-
-
84944096380
-
Language models for image captioning: The quirks and what works
-
J. Devlin, H. Cheng, H. Fang, S. Gupta, L. Deng, X. He, G. Zweig, and M. Mitchell. Language models for image captioning: The quirks and what works. In ACL, 2015.
-
(2015)
ACL
-
-
Devlin, J.1
Cheng, H.2
Fang, H.3
Gupta, S.4
Deng, L.5
He, X.6
Zweig, G.7
Mitchell, M.8
-
10
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
11
-
-
84994631269
-
Early embedding and late reranking for video captioning
-
J. Dong, X. Li, W. Lan, Y. Huo, and C. G. Snoek. Early embedding and late reranking for video captioning. In ACMMM, 2016.
-
(2016)
ACMMM
-
-
Dong, J.1
Li, X.2
Lan, W.3
Huo, Y.4
Snoek, C.G.5
-
13
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
14
-
-
85044213495
-
Stylenet: Generating attractive visual captions with styles
-
C. Gan, Z. Gan, X. He, J. Gao, and L. Deng. Stylenet: Generating attractive visual captions with styles. In CVPR, 2017.
-
(2017)
CVPR
-
-
Gan, C.1
Gan, Z.2
He, X.3
Gao, J.4
Deng, L.5
-
15
-
-
84986281512
-
Learning attributes equals multi-source domain generalization
-
C. Gan, T. Yang, and B. Gong. Learning attributes equals multi-source domain generalization. In CVPR, 2016.
-
(2016)
CVPR
-
-
Gan, C.1
Yang, T.2
Gong, B.3
-
16
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
18
-
-
84973917813
-
Guiding long-short term memory for image caption generation
-
X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding long-short term memory for image caption generation. In ICCV, 2015.
-
(2015)
ICCV
-
-
Jia, X.1
Gavves, E.2
Fernando, B.3
Tuytelaars, T.4
-
19
-
-
84986312327
-
-
J. Jin, K. Fu, R. Cui, F. Sha, and C. Zhang. Aligning where to see and what to tell: image caption with region-based attention and scene factorization. arXiv:1506.06272, 2015.
-
(2015)
Aligning where to See and What to Tell: Image Caption with Region-based Attention and Scene Factorization
-
-
Jin, J.1
Fu, K.2
Cui, R.3
Sha, F.4
Zhang, C.5
-
20
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
21
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
22
-
-
85083951076
-
Adam: A method for stochastic optimization
-
D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
-
(2015)
ICLR
-
-
Kingma, D.1
Ba, J.2
-
25
-
-
84937834963
-
A multiplicative model for learning distributed text-based attribute representations
-
R. Kiros, R. Zemel, and R. R. Salakhutdinov. A multiplicative model for learning distributed text-based attribute representations. In NIPS, 2014.
-
(2014)
NIPS
-
-
Kiros, R.1
Zemel, R.2
Salakhutdinov, R.R.3
-
26
-
-
78650200194
-
Rouge: A package for automatic evaluation of summaries
-
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In ACL workshop, 2004.
-
(2004)
ACL Workshop
-
-
Lin, C.-Y.1
-
27
-
-
84937834115
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
29
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
30
-
-
34948828582
-
Unsupervised learning of image transformations
-
R. Memisevic and G. Hinton. Unsupervised learning of image transformations. In CVPR, 2007.
-
(2007)
CVPR
-
-
Memisevic, R.1
Hinton, G.2
-
31
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
-
(2013)
NIPS
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
32
-
-
84986332702
-
Jointly modeling embedding and translation to bridge video and language
-
Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui. Jointly modeling embedding and translation to bridge video and language. In CVPR, 2016.
-
(2016)
CVPR
-
-
Pan, Y.1
Mei, T.2
Yao, T.3
Li, H.4
Rui, Y.5
-
33
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
34
-
-
85018916536
-
Variational autoencoder for deep learning of images, labels and captions
-
Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning of images, labels and captions. In NIPS, 2016.
-
(2016)
NIPS
-
-
Pu, Y.1
Gan, Z.2
Henao, R.3
Yuan, X.4
Li, C.5
Stevens, A.6
Carin, L.7
-
36
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015.
-
(2015)
IJCV
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
-
37
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. TACL, 2014.
-
(2014)
TACL
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
38
-
-
84998827890
-
Factored temporal sigmoid belief networks for sequence learning
-
J. Song, Z. Gan, and L. Carin. Factored temporal sigmoid belief networks for sequence learning. In ICML, 2016.
-
(2016)
ICML
-
-
Song, J.1
Gan, Z.2
Carin, L.3
-
39
-
-
80053459857
-
Generating text with recurrent neural networks
-
I. Sutskever, J. Martens, and G. E. Hinton. Generating text with recurrent neural networks. In ICML, 2011.
-
(2011)
ICML
-
-
Sutskever, I.1
Martens, J.2
Hinton, G.E.3
-
40
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
-
(2014)
NIPS
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
41
-
-
71149118574
-
Factored conditional restricted boltzmann machines for modeling motion style
-
G. W. Taylor and G. E. Hinton. Factored conditional restricted boltzmann machines for modeling motion style. In ICML, 2009.
-
(2009)
ICML
-
-
Taylor, G.W.1
Hinton, G.E.2
-
43
-
-
84973865953
-
Learning spatiotemporal features with 3d convolutional networks
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
-
(2015)
ICCV
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
44
-
-
85010205139
-
Rich image captioning in the wild
-
K. Tran, X. He, L. Zhang, J. Sun, C. Carapcea, C. Thrasher, C. Buehler, and C. Sienkiewicz. Rich image captioning in the wild. In CVPR Workshops, 2016.
-
(2016)
CVPR Workshops
-
-
Tran, K.1
He, X.2
Zhang, L.3
Sun, J.4
Carapcea, C.5
Thrasher, C.6
Buehler, C.7
Sienkiewicz, C.8
-
46
-
-
84973882730
-
Sequence to sequence-video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, 2015.
-
(2015)
ICCV
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
47
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL, 2015.
-
(2015)
NAACL
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
49
-
-
84986301177
-
What value do explicit high level concepts have in vision to language problems?
-
Q. Wu, C. Shen, L. Liu, A. Dick, and A. v. d. Hengel. What value do explicit high level concepts have in vision to language problems? In CVPR, 2016.
-
(2016)
CVPR
-
-
Wu, Q.1
Shen, C.2
Liu, L.3
Dick, A.4
Hengel, A.V.D.5
-
50
-
-
85018912617
-
On multiplicative integration with recurrent neural networks
-
Y. Wu, S. Zhang, Y. Zhang, Y. Bengio, and R. Salakhutdinov. On multiplicative integration with recurrent neural networks. In NIPS, 2016.
-
(2016)
NIPS
-
-
Wu, Y.1
Zhang, S.2
Zhang, Y.3
Bengio, Y.4
Salakhutdinov, R.5
-
51
-
-
84986260127
-
Msr-vtt: A large video description dataset for bridging video and language
-
J. Xu, T. Mei, T. Yao, and Y. Rui. Msr-vtt: A large video description dataset for bridging video and language. In CVPR, 2016.
-
(2016)
CVPR
-
-
Xu, J.1
Mei, T.2
Yao, T.3
Rui, Y.4
-
52
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
53
-
-
85018878538
-
Review networks for caption generation
-
Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdinov, and W. W. Cohen. Review networks for caption generation. In NIPS, 2016.
-
(2016)
NIPS
-
-
Yang, Z.1
Yuan, Y.2
Wu, Y.3
Salakhutdinov, R.4
Cohen, W.W.5
-
54
-
-
84986317307
-
Image captioning with semantic attention
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016.
-
(2016)
CVPR
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
55
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014.
-
(2014)
TACL
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
56
-
-
84986275061
-
Video paragraph captioning using hierarchical recurrent neural networks
-
H. Yu, J. Wang, Z. Huang, Y. Yang, and W. Xu. Video paragraph captioning using hierarchical recurrent neural networks. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yu, H.1
Wang, J.2
Huang, Z.3
Yang, Y.4
Xu, W.5
|