-
2
-
-
33845260073
-
Neural probabilistic language models
-
Springer
-
Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain. Neural probabilistic language models. In Innova-tions in Machine Learning, pages 137-186. Springer, 2006
-
(2006)
Innova-tions in Machine Learning
, pp. 137-186
-
-
Bengio, Y.1
Schwenk, H.2
Senécal, J.-S.3
Morin, F.4
Gauvain, J.-L.5
-
3
-
-
0028392483
-
Learning long-term dependencies with gradient descent is difficult
-
Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. Neural Net-works, IEEE Transactions on, 5(2):157-166, 1994
-
(1994)
Neural Net-works, IEEE Transactions on
, vol.5
, Issue.2
, pp. 157-166
-
-
Bengio, Y.1
Simard, P.2
Frasconi, P.3
-
4
-
-
84952349295
-
-
arXiv preprint arXiv:1504. 00325
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollr, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504. 00325, 2015
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollr, P.6
Zitnick, C.L.7
-
6
-
-
85198028989
-
Imagenet: A large-scale hierarchical image database
-
IEEE
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255. IEEE, 2009
-
(2009)
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
, pp. 248-255
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
7
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. CVPR, 2015
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
9
-
-
26444565569
-
Finding structure in time
-
J. L. Elman. Finding structure in time. Cognitive science, 14(2):179-211, 1990
-
(1990)
Cognitive Science
, vol.14
, Issue.2
, pp. 179-211
-
-
Elman, J.L.1
-
10
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. CVPR, 2015
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
Zitnick, C.L.11
Zweig, G.12
-
11
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Springer
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, pages 15-29. Springer, 2010
-
(2010)
ECCV
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
12
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, pages 2121-2129, 2013
-
(2013)
Advances in Neural Information Processing Systems
, pp. 2121-2129
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
14
-
-
84906484732
-
Improving image-sentence embeddings using large weakly annotated photo collections
-
Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, pages 529-545, 2014
-
(2014)
ECCV
, pp. 529-545
-
-
Gong, Y.1
Wang, L.2
Hodosh, M.3
Hockenmaier, J.4
Lazebnik, S.5
-
16
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artif. Intell. Res. (JAIR), 47:853-899, 2013
-
(2013)
J. Artif Intell. Res. (JAIR)
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
17
-
-
84913555165
-
-
arXiv preprint arXiv:1408. 5093
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408. 5093, 2014
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
18
-
-
1642574427
-
Imagery in sentence comprehension: An fmri study
-
M. A. Just, S. D. Newman, T. A. Keller, A. McEleney, and P. A. Carpenter. Imagery in sentence comprehension: an fmri study. Neuroimage, 21(1):112-124, 2004
-
(2004)
Neuroimage
, vol.21
, Issue.1
, pp. 112-124
-
-
Just, M.A.1
Newman, S.D.2
Keller, T.A.3
McEleney, A.4
Carpenter, P.A.5
-
19
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CVPR, 2015
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
24
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
IEEE
-
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In CVPR, pages 1601-1608. IEEE, 2011
-
(2011)
CVPR
, pp. 1601-1608
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
26
-
-
0013828836
-
Words versus objects: Comparison of free verbal recall
-
L. R. Lieberman and J. T. Culpepper. Words versus objects: Comparison of free verbal recall. Psychological Reports, 17(3):983-988, 1965
-
(1965)
Psychological Reports
, vol.17
, Issue.3
, pp. 983-988
-
-
Lieberman, L.R.1
Culpepper, J.T.2
-
27
-
-
84937834115
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
28
-
-
84951072975
-
-
arXiv preprint arXiv:1410. 1090
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410. 1090, 2014
-
(2014)
Explain Images with Multimodal Recurrent Neural Networks
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
31
-
-
84858966958
-
Strategies for training large scale neural network language models
-
IEEE
-
T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Cernocky. Strategies for training large scale neural network language models. In Automatic Speech Recognition and Understand-ing (ASRU), 2011 IEEE Workshop on, pages 196-201. IEEE, 2011
-
(2011)
Automatic Speech Recognition and Understand-ing (ASRU), 2011 IEEE Workshop on
, pp. 196-201
-
-
Mikolov, T.1
Deoras, A.2
Povey, D.3
Burget, L.4
Cernocky, J.5
-
32
-
-
84874235486
-
Context dependent recurrent neural network language model
-
T. Mikolov and G. Zweig. Context dependent recurrent neural network language model. In SLT, pages 234-239, 2012
-
(2012)
SLT
, pp. 234-239
-
-
Mikolov, T.1
Zweig, G.2
-
33
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Association for Computational Linguistics
-
M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, and H. Daumé III. Midge: Generating image descriptions from computer vision detections. In EACL, pages 747-756. Association for Computational Linguistics, 2012
-
(2012)
EACL
, pp. 747-756
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Daumé, H.10
-
34
-
-
80054092539
-
Why are pictures easier to recall than words
-
A. Paivio, T. B. Rogers, and P. C. Smythe. Why are pictures easier to recall than words Psychonomic Science, 11(4):137-138, 1968
-
(1968)
Psychonomic Science
, vol.11
, Issue.4
, pp. 137-138
-
-
Paivio, A.1
Rogers, T.B.2
Smythe, P.C.3
-
35
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
Association for Computational Linguistics
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311-318. Association for Computational Linguistics, 2002
-
(2002)
Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
38
-
-
84928030723
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, Q. Le, C. Manning, and A. Ng. Grounded compositional semantics for finding and describing images with sentences. In NIPS Deep Learning Workshop, 2013
-
(2013)
NIPS Deep Learning Workshop
-
-
Socher, R.1
Le, Q.2
Manning, C.3
Ng, A.4
-
40
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. CVPR, 2015
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
41
-
-
56449089103
-
Extracting and composing robust features with denoising autoencoders
-
ACM
-
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international confer-ence on Machine learning, pages 1096-1103. ACM, 2008
-
(2008)
Proceedings of the 25th International Confer-ence on Machine Learning
, pp. 1096-1103
-
-
Vincent, P.1
Larochelle, H.2
Bengio, Y.3
Manzagol, P.-A.4
-
43
-
-
0003066062
-
Experimental analysis of the real-time recurrent learning algorithm
-
R. J. Williams and D. Zipser. Experimental analysis of the real-time recurrent learning algorithm. Connection Science, 1(1):87-111, 1989
-
(1989)
Connection Science
, vol.1
, Issue.1
, pp. 87-111
-
-
Williams, R.J.1
Zipser, D.2
-
44
-
-
84939821074
-
-
arXiv preprint arXiv:1502. 03044
-
K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502. 03044, 2015
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
45
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
Y. Yang, C. L. Teo, H. Daumé III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, 2011
-
(2011)
EMNLP
-
-
Yang, Y.1
Teo, C.L.2
Daumé, H.3
Aloimonos, Y.4
-
46
-
-
77954862144
-
I2T: Image parsing to text description
-
B. Z. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu. I2T: Image parsing to text description. Proceedings of the IEEE, 98(8):1485-1508, 2010
-
(2010)
Proceedings of the IEEE
, vol.98
, Issue.8
, pp. 1485-1508
-
-
Yao, B.Z.1
Yang, X.2
Lin, L.3
Lee, M.W.4
Zhu, S.-C.5
|