-
1
-
-
85083951423
-
Multiple object recognition with visual attention
-
J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. ICLR, 2015.
-
(2015)
ICLR
-
-
Ba, J.1
Mnih, V.2
Kavukcuoglu, K.3
-
2
-
-
84965166940
-
Neural machine translation by jointly learning to align and translate
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2014.
-
(2014)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
3
-
-
84952349295
-
-
arXiv preprint arXiv:1504.00325
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollar, P.6
Zitnick, C.L.7
-
4
-
-
84957029470
-
Mind's eye: A recurrent visual representation for image caption generation
-
X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, pages 2422-2431, 2015.
-
(2015)
CVPR
, pp. 2422-2431
-
-
Chen, X.1
Zitnick, C.L.2
-
5
-
-
84961291190
-
Learning phrase representations using rnn encoder-decoder for statistical machine translation
-
K. Cho, B. Van Merrïenboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014.
-
(2014)
EMNLP
-
-
Cho, K.1
Van Merrïenboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
6
-
-
84867478719
-
Learning where to attend with deep architectures for image tracking
-
M. Denil, L. Bazzani, H. Larochelle, and N. de Freitas. Learning where to attend with deep architectures for image tracking. Neural computation, 24(8):2151-2184, 2012.
-
(2012)
Neural Computation
, vol.24
, Issue.8
, pp. 2151-2184
-
-
Denil, M.1
Bazzani, L.2
Larochelle, H.3
De Freitas, N.4
-
7
-
-
84965102873
-
-
arXiv preprint arXiv:1505.04467
-
J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467, 2015.
-
(2015)
Exploring Nearest Neighbor Approaches for Image Captioning
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.3
Mitchell, M.4
Zitnick, C.L.5
-
8
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, pages 2626-2634, 2015.
-
(2015)
CVPR
, pp. 2626-2634
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
9
-
-
84906929591
-
Image description using visual dependency representations
-
D. Elliott and F. Keller. Image description using visual dependency representations. In EMNLP, pages 1292-1302, 2013.
-
(2013)
EMNLP
, pp. 1292-1302
-
-
Elliott, D.1
Keller, F.2
-
10
-
-
84959190514
-
On the relationship between visual attributes and convolutional networks
-
V. Escorcia, J. C. Niebles, and B. Ghanem. On the relationship between visual attributes and convolutional networks. In CVPR, pages 1256-1264, 2015.
-
(2015)
CVPR
, pp. 1256-1264
-
-
Escorcia, V.1
Niebles, J.C.2
Ghanem, B.3
-
11
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015.
-
(2015)
CVPR
, pp. 1473-1482
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
-
12
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Springer
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, pages 15-29. Springer, 2010.
-
(2010)
ECCV
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
13
-
-
85083950293
-
Deep convolutional ranking for multilabel image annotation
-
Y. Gong, Y. Jia, T. Leung, A. Toshev, and S. Ioffe. Deep convolutional ranking for multilabel image annotation. ICLR, 2014.
-
(2014)
ICLR
-
-
Gong, Y.1
Jia, Y.2
Leung, T.3
Toshev, A.4
Ioffe, S.5
-
14
-
-
84906484732
-
Improving image-sentence embeddings using large weakly annotated photo collections
-
Springer
-
Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, pages 529-545. Springer, 2014.
-
(2014)
ECCV
, pp. 529-545
-
-
Gong, Y.1
Wang, L.2
Hodosh, M.3
Hockenmaier, J.4
Lazebnik, S.5
-
16
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
June
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, June 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
17
-
-
0003153058
-
Shifts in selective visual attention: Towards the underlying neural circuitry
-
Springer
-
C. Koch and S. Ullman. Shifts in selective visual attention: towards the underlying neural circuitry. In Matters of intelligence, pages 115-141. Springer, 1987.
-
(1987)
Matters of Intelligence
, pp. 115-141
-
-
Koch, C.1
Ullman, S.2
-
18
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097-1105, 2012.
-
(2012)
NIPS
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
19
-
-
80052901011
-
Baby talk: Understanding and generating image descriptions
-
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating image descriptions. In CVPR. Citeseer, 2011.
-
(2011)
CVPR. Citeseer
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
20
-
-
84878189119
-
Collective generation of natural image descriptions
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, pages 359-368, 2012.
-
(2012)
ACL
, pp. 359-368
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
21
-
-
85162061663
-
Learning to combine foveal glimpses with a third-order boltzmann machine
-
H. Larochelle and G. E. Hinton. Learning to combine foveal glimpses with a third-order boltzmann machine. In NIPS, pages 1243-1251, 2010.
-
(2010)
NIPS
, pp. 1243-1251
-
-
Larochelle, H.1
Hinton, G.E.2
-
22
-
-
85083952381
-
Simple image description generator via a linear phrase-based approach
-
R. Lebret, P. O. Pinheiro, and R. Collobert. Simple image description generator via a linear phrase-based approach. ICLR, 2015.
-
(2015)
ICLR
-
-
Lebret, R.1
Pinheiro, P.O.2
Collobert, R.3
-
23
-
-
84862279067
-
Composing simple image descriptions using web-scale ngrams
-
S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale ngrams. In CoNLL, pages 220-228, 2011.
-
(2011)
CoNLL
, pp. 220-228
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
24
-
-
84959205572
-
Fully convolutional networks for semantic segmentation
-
June
-
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, June 2015.
-
(2015)
CVPR
-
-
Long, J.1
Shelhamer, E.2
Darrell, T.3
-
25
-
-
84973863256
-
Learning like a child: Fast novel visual concept learning from sentence descriptions of images
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, 2015.
-
(2015)
ICCV
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
26
-
-
84939821073
-
-
arXiv preprint arXiv:1412.6632
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (mrnn). arXiv preprint arXiv:1412.6632, 2014.
-
(2014)
Deep Captioning with Multimodal Recurrent Neural Networks (Mrnn)
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.5
-
27
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119, 2013.
-
(2013)
NIPS
, pp. 3111-3119
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
28
-
-
84937959846
-
Recurrent models of visual attention
-
V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In NIPS, pages 2204-2212, 2014.
-
(2014)
NIPS
, pp. 2204-2212
-
-
Mnih, V.1
Heess, N.2
Graves, A.3
-
29
-
-
84961289992
-
Glove: Global vectors for word representation
-
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. EMNLP, 12:1532-1543, 2014.
-
(2014)
EMNLP
, vol.12
, pp. 1532-1543
-
-
Pennington, J.1
Socher, R.2
Manning, C.D.3
-
31
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104-3112, 2014.
-
(2014)
NIPS
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
32
-
-
84937522268
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
33
-
-
84937843152
-
Learning generative models with visual attention
-
Y. Tang, N. Srivastava, and R. R. Salakhutdinov. Learning generative models with visual attention. In NIPS, pages 1808-1816, 2014.
-
(2014)
NIPS
, pp. 1808-1816
-
-
Tang, Y.1
Srivastava, N.2
Salakhutdinov, R.R.3
-
35
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015.
-
(2015)
CVPR
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
36
-
-
84986301177
-
What Value Do Explicit High-Level Concepts Have in Vision to Language Problems?
-
Q. Wu, C. Shen, A. van den Hengel, L. Liu, and A. Dick. What Value Do Explicit High-Level Concepts Have in Vision to Language Problems? In CVPR, 2016.
-
(2016)
CVPR
-
-
Wu, Q.1
Shen, C.2
Hengel Den A.Van3
Liu, L.4
Dick, A.5
-
37
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
38
-
-
84959187860
-
Conceptlearner: Discovering visual concepts from weakly labeled image collections
-
June
-
B. Zhou, V. Jagadeesh, and R. Piramuthu. Conceptlearner: Discovering visual concepts from weakly labeled image collections. In CVPR, June 2015.
-
(2015)
CVPR
-
-
Zhou, B.1
Jagadeesh, V.2
Piramuthu, R.3
|