-
1
-
-
84959502295
-
-
arXiv 1505.00468
-
Antol, S.; Agrawal, A.; Lu, J.; Mitchell, M.; Batra, D.; Zitnick, C. L.; and Parikh, D. 2015. VQA: visual question answering. arXiv 1505.00468.
-
(2015)
VQA: Visual Question Answering
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
3
-
-
84890527827
-
Improving deep neural networks for LVCSR using rectified linear units and dropout
-
Dahl, G. E.; Sainath, T. N.; and Hinton, G. E. 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. In ICASSP.
-
(2013)
ICASSP
-
-
Dahl, G.E.1
Sainath, T.N.2
Hinton, G.E.3
-
4
-
-
84944046597
-
-
arXiv 1411.4389
-
Donahue, J.; Hendricks, L. A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; and Darrell, T. 2014. Longterm recurrent convolutional networks for visual recognition and description. arXiv 1411.4389.
-
(2014)
Longterm Recurrent Convolutional Networks for Visual Recognition and Description
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
5
-
-
84944115860
-
-
arXiv 1411.4952
-
Fang, H.; Gupta, S.; Iandola, F. N.; Srivastava, R.; Deng, L.; Dolĺar, P.; Gao, J.; He, X.; Mitchell, M.; Platt, J. C.; Zitnick, C. L.; and Zweig, G. 2014. From captions to visual concepts and back. arXiv 1411.4952.
-
(2014)
From Captions to Visual Concepts and Back
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
6
-
-
84898958665
-
Devise: A deep visualsemantic embedding model
-
Frome, A.; Corrado, G.; Shlens, J.; Bengio, S.; Dean, J.; Ranzato, M.; and Mikolov, T. 2013. Devise: A deep visualsemantic embedding model. In NIPS.
-
(2013)
NIPS
-
-
Frome, A.1
Corrado, G.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Ranzato, M.6
Mikolov, T.7
-
7
-
-
84957033954
-
-
arXiv 1505.05612
-
Gao, H.; Mao, J.; Zhou, J.; Huang, Z.; Wang, L.; and Xu, W. 2015. Are you talking to a machine? dataset and methods for multilingual image question answering. arXiv 1505.05612.
-
(2015)
Are you Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
9
-
-
84937936034
-
Convolutional neural network architectures for matching natural language sentences
-
Hu, B.; Lu, Z.; Li, H.; and Chen, Q. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS.
-
(2014)
NIPS
-
-
Hu, B.1
Lu, Z.2
Li, H.3
Chen, Q.4
-
10
-
-
84906922163
-
A convolutional neural network for modelling sentences
-
Kalchbrenner, N.; Grefenstette, E.; and Blunsom, P. 2014. A convolutional neural network for modelling sentences. In ACL.
-
(2014)
ACL
-
-
Kalchbrenner, N.1
Grefenstette, E.2
Blunsom, P.3
-
12
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
Karpathy, A.; Joulin, A.; and Li, F.-F. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS.
-
(2014)
NIPS
-
-
Karpathy, A.1
Joulin, A.2
Li, F.-F.3
-
13
-
-
84961376850
-
Convolutional neural networks for sentence classification
-
Kim, Y. 2014. Convolutional neural networks for sentence classification. In EMNLP.
-
(2014)
EMNLP
-
-
Kim, Y.1
-
16
-
-
84965125568
-
Fisher vectors derived from hybrid Gaussian-laplacian mixture models for image annotation
-
Klein, B.; Lev, G.; Sadeh, G.; and Wolf, L. 2015. Fisher vectors derived from hybrid gaussian-laplacian mixture models for image annotation. In CVPR.
-
(2015)
CVPR
-
-
Klein, B.1
Lev, G.2
Sadeh, G.3
Wolf, L.4
-
18
-
-
84973864182
-
Multimodal convolutional neural networks for matching image and sentence
-
Ma, L.; Lu, Z.; Shang, L.; and Li, H. 2015. Multimodal convolutional neural networks for matching image and sentence. In ICCV.
-
(2015)
ICCV
-
-
Ma, L.1
Lu, Z.2
Shang, L.3
Li, H.4
-
19
-
-
84937822746
-
A multi-world approach to question answering about real-world scenes based on uncertain input
-
Malinowski, M., and Fritz, M. 2014a. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS.
-
(2014)
NIPS
-
-
Malinowski, M.1
Fritz, M.2
-
23
-
-
84939821073
-
-
arXiv 1412.6632
-
Mao, J.; Xu, W.; Yang, Y.; Wang, J.; and Yuille, A. L. 2014a. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv 1412.6632.
-
(2014)
Deep Captioning with Multimodal Recurrent Neural Networks (M-rnn)
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
24
-
-
84951072975
-
-
arXiv 1410.1090
-
Mao, J.; Xu, W.; Yang, Y.; Wang, J.; and Yuille, A. L. 2014b. Explain images with multimodal recurrent neural networks. arXiv 1410.1090.
-
(2014)
Explain Images with Multimodal Recurrent Neural Networks
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
26
-
-
85007243154
-
Mutual learning of an object concept and language model based on mlda and npylm
-
Nakamura, T.; Nagai, T.; Funakoshi, K.; Nagasaka, S.; Taniguchi, T.; and Iwahashi, N. 2013. Mutual learning of an object concept and language model based on mlda and npylm. In IROS.
-
(2013)
IROS
-
-
Nakamura, T.1
Nagai, T.2
Funakoshi, K.3
Nagasaka, S.4
Taniguchi, T.5
Iwahashi, N.6
-
27
-
-
85162522202
-
Im2text: Describing images using 1 million captioned photographs
-
Ordonez, V.; Kulkarni, G.; and Berg, T. L. 2011. Im2text: Describing images using 1 million captioned photographs. In NIPS.
-
(2011)
NIPS
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
-
30
-
-
84964474107
-
Grounded compositional semantics for finding and describing images with sentences
-
Socher, R.; Karpathy, A.; Le, Q. V.; Manning, C. D.; and Ng, A. Y. 2014. Grounded compositional semantics for finding and describing images with sentences. In TACL.
-
(2014)
TACL
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
31
-
-
84937522268
-
Going deeper with convolutions
-
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In CVPR.
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
33
-
-
85146676791
-
Verb semantics and lexical selection
-
Wu, Z., and Palmer, M. S. 1994. Verb semantics and lexical selection. In ACL.
-
(1994)
ACL
-
-
Wu, Z.1
Palmer, M.S.2
-
34
-
-
84939821074
-
-
arXiv 1502.03044
-
Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.; and Bengio, Y. 2015a. Show, attend and tell: Neural image caption generation with visual attention. arXiv 1502.03044.
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
-
35
-
-
84952349307
-
Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
-
Xu, R.; Xiong, C.; Chen, W.; and Corso, J. 2015b. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI.
-
(2015)
AAAI
-
-
Xu, R.1
Xiong, C.2
Chen, W.3
Corso, J.4
|