-
1
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015
-
(2015)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
2
-
-
84965179228
-
Scheduled sampling for sequence prediction with recurrent neural networks
-
S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS, 2015
-
(2015)
NIPS
-
-
Bengio, S.1
Vinyals, O.2
Jaitly, N.3
Shazeer, N.4
-
3
-
-
84986269551
-
Weakly supervised deep detection networks
-
H. Bilen and A. Vedaldi. Weakly supervised deep detection networks. In CVPR, 2016
-
(2016)
CVPR
-
-
Bilen, H.1
Vedaldi, A.2
-
4
-
-
84965139600
-
Attention-based models for speech recognition
-
J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio. Attention-based models for speech recognition. In NIPS, 2015
-
(2015)
NIPS
-
-
Chorowski, J.1
Bahdanau, D.2
Serdyuk, D.3
Cho, K.4
Bengio, Y.5
-
6
-
-
84911376072
-
Multi-fold MIL training for weakly supervised object localization
-
R. Cinbis, J. Verbeek, and C. Schmid. Multi-fold MIL training for weakly supervised object localization. In CVPR, 2014
-
(2014)
CVPR
-
-
Cinbis, R.1
Verbeek, J.2
Schmid, C.3
-
7
-
-
85198028989
-
Imagenet: A large-scale hierarchical image database
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009
-
(2009)
CVPR
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
8
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.2
Rohrbach, M.3
Venugopalan, S.4
Guadarrama, S.5
Saenko, K.6
Darrell, T.7
-
9
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, C. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
Zitnick, C.11
Zweig, G.12
-
10
-
-
85029359197
-
Fast r-cnn
-
R. Girshick. Fast R-CNN. In ICCV, 2015
-
(2015)
ICCV
-
-
Girshick, R.1
-
12
-
-
84928278589
-
Spatial pyramid pooling in deep convolutional networks for visual recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014
-
(2014)
ECCV
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
15
-
-
84986312327
-
-
arXiv:1506.06272
-
J. Jin, K. Fu, R. Cui, F. Sha, and C. Zhang. Aligning where to see and what to tell: image caption with region-based attention and scene factorization. ArXiv:1506.06272, 2015
-
(2015)
Aligning Where to See and What to Tell: Image Caption with Region-based Attention and Scene Factorization
-
-
Jin, J.1
Fu, K.2
Cui, R.3
Sha, F.4
Zhang, C.5
-
16
-
-
84986245786
-
DenseCap: Fully convolutional localization networks for dense captioning
-
J. Johnson, A. Karpathy, and L. Fei-Fei. DenseCap: Fully convolutional localization networks for dense captioning. In CVPR, 2016
-
(2016)
CVPR
-
-
Johnson, J.1
Karpathy, A.2
Fei-Fei, L.3
-
17
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
18
-
-
85083951076
-
Adam: A method for stochastic optimization
-
D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015
-
(2015)
ICLR
-
-
Kingma, D.1
Ba, J.2
-
20
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
T. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. Zitnick. Microsoft COCO: common objects in context. In ECCV, 2014
-
(2014)
ECCV
-
-
Lin, T.1
Maire, M.2
Belongie, S.3
Bourdev, L.4
Girshick, R.5
Hays, J.6
Perona, P.7
Ramanan, D.8
Dollár, P.9
Zitnick, C.10
-
21
-
-
85030472316
-
Attention correctness in neural image captioning
-
C. Liu, J. Mao, F. Sha, and A. Yuille. Attention correctness in neural image captioning. In AAAI, 2017
-
(2017)
AAAI
-
-
Liu, C.1
Mao, J.2
Sha, F.3
Yuille, A.4
-
22
-
-
85011302702
-
SSD: Single shot multibox detector
-
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. Berg. SSD: Single shot multibox detector. In ECCV, 2016
-
(2016)
ECCV
-
-
Liu, W.1
Anguelov, D.2
Erhan, D.3
Szegedy, C.4
Reed, S.5
Fu, C.-Y.6
Berg, A.7
-
23
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-RNN)
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-RNN). ICLR, 2015
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
24
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer image-tosentence models
-
B. Plummer, L. Wang, C. Cervantes, J. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-tosentence models. In ICCV, 2015
-
(2015)
ICCV
-
-
Plummer, B.1
Wang, L.2
Cervantes, C.3
Caicedo, J.4
Hockenmaier, J.5
Lazebnik, S.6
-
25
-
-
85083951479
-
Sequence level training with recurrent neural networks
-
M. Ranzato, S. Chopra, M. Auli, andW. Zaremba. Sequence level training with recurrent neural networks. In ICLR, 2016
-
(2016)
ICLR
-
-
Ranzato, M.1
Chopra, S.2
Auli, M.3
Zaremba, W.4
-
26
-
-
84960980241
-
Faster R-CNN: Towards real-time object detection with region proposal networks
-
S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015
-
(2015)
NIPS
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
27
-
-
84990024294
-
Grounding of textual phrases in images by reconstruction
-
A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele. Grounding of textual phrases in images by reconstruction. In ECCV, 2016
-
(2016)
ECCV
-
-
Rohrbach, A.1
Rohrbach, M.2
Hu, R.3
Darrell, T.4
Schiele, B.5
-
28
-
-
84885881090
-
Objectcentric spatial pooling for image classification
-
O. Russakovsky, Y. Lin, K. Yu, and L. Fei-Fei. Objectcentric spatial pooling for image classification. In ECCV, 2012
-
(2012)
ECCV
-
-
Russakovsky, O.1
Lin, Y.2
Yu, K.3
Fei-Fei, L.4
-
29
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
30
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. Le. Sequence to sequence learning with neural networks. In NIPS, 2014
-
(2014)
NIPS
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.3
-
31
-
-
84881160857
-
Selective search for object recognition
-
J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 104(2):154-171, 2013
-
(2013)
IJCV
, vol.104
, Issue.2
, pp. 154-171
-
-
Uijlings, J.1
Van de Sande, K.2
Gevers, T.3
Smeulders, A.4
-
33
-
-
84986301177
-
What value do explicit high level concepts have in vision to language problems
-
Q. Wu, C. Shen, L. Liu, A. Dick, and A. van den Hengel. What value do explicit high level concepts have in vision to language problems In CVPR, 2016
-
(2016)
CVPR
-
-
Wu, Q.1
Shen, C.2
Liu, L.3
Dick, A.4
Van den Hengel, A.5
-
34
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
-
35
-
-
85030211479
-
Encode, review, and decode: Reviewer module for caption generation
-
Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdinov, and W. Cohen. Encode, review, and decode: Reviewer module for caption generation. In NIPS, 2016
-
(2016)
NIPS
-
-
Yang, Z.1
Yuan, Y.2
Wu, Y.3
Salakhutdinov, R.4
Cohen, W.5
-
36
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015
-
(2015)
ICCV
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
37
-
-
84986240394
-
-
S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and L. Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos
-
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
-
-
Yeung, S.1
Russakovsky, O.2
Jin, N.3
Andriluka, M.4
Mori, G.5
Fei-Fei, L.6
-
38
-
-
84986317307
-
Image captioning with semantic attention
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016
-
(2016)
CVPR
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
39
-
-
84952018709
-
Edge boxes: Locating object proposals from edges
-
C. Zitnick and P. Dollár. Edge boxes: locating object proposals from edges. In ECCV, 2014.
-
(2014)
ECCV
-
-
Zitnick, C.1
Dollár, P.2
|