-
1
-
-
85083954507
-
Delving deeper into convolu-tional networks for learning video representations
-
Nicolas Ballas, Li Yao, Chris Pal, and Aaron Courville. Delving deeper into convolu-tional networks for learning video representations. ICLR, 2016.
-
(2016)
ICLR
-
-
Ballas, N.1
Yao, L.2
Pal, C.3
Courville, A.4
-
2
-
-
84885996388
-
Video in sentences out
-
UAI
-
A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, et al. Video in sentences out. UAI, 2012.
-
(2012)
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Narayanaswamy, S.9
Salvi, D.10
-
4
-
-
0142166851
-
A neural probabilistic language model
-
Yoshua Bengio, R jean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137–1155, 2003.
-
(2003)
The Journal of Machine Learning Research
, vol.3
, pp. 1137-1155
-
-
Bengio, Y.1
Ducharme, R.J.2
Vincent, P.3
Janvin, C.4
-
5
-
-
84857855190
-
Random search for hyper-parameter optimization
-
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. JMLR, 2012.
-
(2012)
JMLR
-
-
Bergstra, J.1
Bengio, Y.2
-
6
-
-
84952349295
-
-
Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv 1504.00325, 2015.
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollar, P.6
Lawrence Zitnick, C.7
-
7
-
-
84961291190
-
Learning phrase representations using rnn encoder-decoder for statistical machine translation
-
Kyunghyun Cho, Bart Van Merri nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014.
-
(2014)
EMNLP
-
-
Cho, K.1
Van Merri nboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
8
-
-
84952349296
-
-
arXiv preprint
-
Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, and Margaret Mitchell. Language models for image captioning: The quirks and what works. arXiv preprint arXiv:1505.01809, 2015.
-
(2015)
Language Models for Image Captioning: The Quirks and What Works
-
-
Devlin, J.1
Cheng, H.2
Fang, H.3
Gupta, S.4
Deng, L.5
He, X.6
Zweig, G.7
Mitchell, M.8
-
9
-
-
84965102873
-
-
arXiv preprint
-
Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, and C Lawrence Zitnick. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467, 2015.
-
(2015)
Exploring Nearest Neighbor Approaches for Image Captioning
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.3
Mitchell, M.4
Lawrence Zitnick, C.5
-
10
-
-
84959236502
-
Long-term recurrent convo-lutional networks for visual recognition and description
-
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Long-term recurrent convo-lutional networks for visual recognition and description. CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
11
-
-
84959250180
-
From captions to visual concepts and back
-
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Doll r, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, et al. From captions to visual concepts and back. CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollr, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
-
12
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
-
Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, 2013.
-
(2013)
ICCV
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
13
-
-
84867720412
-
-
arXiv preprint
-
Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
-
(2012)
Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors
-
-
Hinton, G.E.1
Srivastava, N.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.R.5
-
17
-
-
84952902559
-
Deep visual-semantic alignments for generating image descriptions
-
A Karpathy and L Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
19
-
-
0036843382
-
Natural language description of human activities from video images based on concept hierarchy of actions
-
Atsuhiro Kojima, Takeshi Tamura, and Kunio Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. IJCV, 2002.
-
(2002)
IJCV
-
-
Kojima, A.1
Tamura, T.2
Fukunaga, K.3
-
20
-
-
84887601544
-
Babytalk: Understanding and generating simple image descriptions
-
Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. Babytalk: Understanding and generating simple image descriptions. PAMI, 2013.
-
(2013)
PAMI
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
21
-
-
84878189119
-
Collective generation of natural image descriptions
-
Association for Computational Linguistics
-
Polina Kuznetsova, Vicente Ordonez, Alexander C Berg, Tamara L Berg, and Yejin Choi. Collective generation of natural image descriptions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 359–368. Association for Computational Linguistics, 2012.
-
(2012)
Proceedings of The 50th Annual Meeting of The Association for Computational Linguistics: Long Papers-Volume
, vol.1
, pp. 359-368
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
22
-
-
84906493406
-
Microsoft coco: Common objects in context
-
Springer
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ra-manan, Piotr Doll r, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014, pages 740–755. Springer, 2014.
-
(2014)
Computer Vision–ECCV 2014
, pp. 740-755
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ra-Manan, D.6
Dollr, P.7
Lawrence Zitnick, C.8
-
23
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). ICLR, 2015.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.5
-
24
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Association for Computational Linguistics
-
Margaret Mitchell, Xufeng Han, Jesse Dodge, Alyssa Mensch, Amit Goyal, Alex Berg, Kota Yamaguchi, Tamara Berg, Karl Stratos, and Hal Daum III. Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 747–756. Association for Computational Linguistics, 2012.
-
(2012)
Proceedings of The 13th Conference of The European Chapter of The Association for Computational Linguistics
, pp. 747-756
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Daum, H.10
-
25
-
-
85028032121
-
-
Qi Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, and Anthony Dick. What value high level concepts in vision to language problems? arXiv 1506.01144, 2015.
-
(2015)
What Value High Level Concepts in Vision to Language Problems?
-
-
Wu, Q.Q.1
Shen, C.2
van den Hengel, A.3
Liu, L.4
Dick, A.5
-
28
-
-
84898775239
-
Translating video content to natural language descriptions
-
Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, and Bernt Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
-
(2013)
ICCV
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
31
-
-
84956980995
-
CIDEr: Consensus-based image description evaluation
-
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. CIDEr: Consensus-based image description evaluation. CVPR, 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Lawrence Zitnick, C.2
Parikh, D.3
-
32
-
-
84973882730
-
Sequence to sequence – Video to text
-
Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. Sequence to sequence – video to text. In ICCV, 2015.
-
(2015)
ICCV
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
33
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, and Kate Saenko. Translating videos to natural language using deep recurrent neural networks. NAACL, 2015.
-
(2015)
NAACL
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
34
-
-
85044451662
-
Show and tell: A neural image caption generator
-
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. CVPR, 2014.
-
(2014)
CVPR
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
35
-
-
85015998428
-
-
Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, and Kate Saenko. A multi-scale multiple instance video description network. arXiv 1505.05914, 2015.
-
(2015)
A Multi-Scale Multiple Instance Video Description Network
-
-
Xu, H.1
Venugopalan, S.2
Ramanishka, V.3
Rohrbach, M.4
Saenko, K.5
-
36
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
37
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville. Describing videos by exploiting temporal structure. In ICCV, 2015.
-
(2015)
ICCV
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
39
-
-
84990820289
-
-
Haonan Yu, Jiang Wang, Zhiheng Huang, Yi Yang, and Wei Xu. Video paragraph captioning using hierarchical recurrent neural networks. arXiv 1510.07712, 2015.
-
(2015)
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
-
-
Yu, H.1
Wang, J.2
Huang, Z.3
Yang, Y.4
Xu, W.5
|