-
1
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2015. 4
-
(2015)
ICLR
, vol.4
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
2
-
-
84943800045
-
Weakly supervised action labeling in videos under ordering constraints
-
P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Weakly supervised action labeling in videos under ordering constraints. In ECCV, 2014. 2
-
(2014)
ECCV
, vol.2
-
-
Bojanowski, P.1
Lajugie, R.2
Bach, F.3
Laptev, I.4
Ponce, J.5
Schmid, C.6
Sivic, J.7
-
3
-
-
84961291190
-
Learning phrase representations using rnn encoderdecoder for statistical machine translation
-
K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoderdecoder for statistical machine translation. EMNLP, 2014. 4
-
(2014)
EMNLP
, vol.4
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bougares, F.4
Schwenk, H.5
Bengio, Y.6
-
5
-
-
70450145539
-
Movie/script: Alignment and parsing of video and text transcription
-
T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar. Movie/script: Alignment and parsing of video and text transcription. In ECCV, 2008. 2
-
(2008)
ECCV
, vol.2
-
-
Cour, T.1
Jordan, C.2
Miltsakaki, E.3
Taskar, B.4
-
6
-
-
84898027861
-
Hello! My name is Buffy-Automatic Naming of Characters in TV Video
-
M. Everingham, J. Sivic, and A. Zisserman. "Hello! My name is. Buffy"-Automatic Naming of Characters in TV Video. BMVC, pages 899-908, 2006. 2
-
(2006)
BMVC
, vol.2
, pp. 899-908
-
-
Everingham, M.1
Sivic, J.2
Zisserman, A.3
-
7
-
-
80051961229
-
Every picture tells a story: Generating sentences for images
-
A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences for images. In ECCV, 2010. 2
-
(2010)
ECCV
, vol.2
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
8
-
-
84887365305
-
A sentence is worth a thousand pixels
-
S. Fidler, A. Sharma, and R. Urtasun. A sentence is worth a thousand pixels. In CVPR, 2013. 2
-
(2013)
CVPR
, vol.2
-
-
Fidler, S.1
Sharma, A.2
Urtasun, R.3
-
9
-
-
57149125139
-
Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
-
A. Gupta and L. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV, 2008. 1
-
(2008)
ECCV
, vol.1
-
-
Gupta, A.1
Davis, L.2
-
11
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
2
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47: 853-899, 2013. 2
-
(2013)
JAIR
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
12
-
-
84926283798
-
Recurrent continuous translation models
-
N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. In EMNLP, pages 1700-1709, 2013. 4
-
(2013)
EMNLP
, vol.4
, pp. 1700-1709
-
-
Kalchbrenner, N.1
Blunsom, P.2
-
13
-
-
84952902559
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 1, 2
-
(2015)
CVPR
, vol.1
, pp. 2
-
-
Karpathy, A.1
Fei-Fei, L.2
-
15
-
-
84973927487
-
-
1, 2, 3, 5, 7, 8. CoRR, abs/ 1411
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visualsemantic embeddings with multimodal neural language models. CoRR, abs/1411. 2539, 2014. 1, 2, 3, 5, 7, 8
-
(2014)
Unifying Visualsemantic Embeddings with Multimodal Neural Language Models
, vol.2539
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
16
-
-
84973930208
-
-
arXiv preprint arXiv
-
R. Kiros, Y. Zhu, R. Salakhutdinov, R. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-thought vectors. ArXiv preprint arXiv, 2015. 3, 4
-
(2015)
Skip-thought Vectors
, vol.3
, pp. 4
-
-
Kiros, R.1
Zhu, Y.2
Salakhutdinov, R.3
Zemel, R.4
Torralba, A.5
Urtasun, R.6
Fidler, S.7
-
17
-
-
84911370987
-
What are you talking about text-to-image coreference
-
C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about text-to-image coreference. In CVPR, 2014. 1, 2
-
(2014)
CVPR
, vol.1
, pp. 2
-
-
Kong, C.1
Lin, D.2
Bansal, M.3
Urtasun, R.4
Fidler, S.5
-
18
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. Berg, and T. Berg. Baby talk: Understanding and generating simple image descriptions. In CVPR, 2011. 2
-
(2011)
CVPR
, vol.2
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.6
Berg, T.7
-
19
-
-
84911442106
-
Visual semantic search: Retrieving videos via complex textual queries
-
2
-
D. Lin, S. Fidler, C. Kong, and R. Urtasun. Visual Semantic Search: Retrieving Videos via Complex Textual Queries. CVPR, pages 2657-2664, 2014. 1, 2
-
(2014)
CVPR
, vol.1
, pp. 2657-2664
-
-
Lin, D.1
Fidler, S.2
Kong, C.3
Urtasun, R.4
-
20
-
-
84906493406
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, pages 740-755. 2014. 1
-
(2014)
ECCV
, vol.1
, pp. 740-755
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
21
-
-
84959227898
-
Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks
-
X. Lin and D. Parikh. Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks. In CVPR, 2015. 1
-
(2015)
CVPR
, vol.1
-
-
Lin, X.1
Parikh, D.2
-
22
-
-
84937822746
-
A multi-world approach to question answering about real-world scenes based on uncertain input
-
M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS, 2014. 1
-
(2014)
NIPS
, vol.1
-
-
Malinowski, M.1
Fritz, M.2
-
23
-
-
84959916685
-
Whats cookin interpreting cooking videos using text, speech and vision
-
J. Malmaud, J. Huang, V. Rathod, N. Johnston, A. Rabinovich, and K. Murphy. Whats Cookin Interpreting Cooking Videos using Text, Speech and Vision. In NAACL, 2015. 2
-
(2015)
NAACL
, vol.2
-
-
Malmaud, J.1
Huang, J.2
Rathod, V.3
Johnston, N.4
Rabinovich, A.5
Murphy, K.6
-
24
-
-
84973925553
-
-
arXiv: 1410. 1090. 2
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. In arXiv: 1410. 1090, 2014. 1, 2
-
(2014)
Explain Images with Multimodal Recurrent Neural Networks
, vol.1
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
25
-
-
85083951332
-
-
arXiv preprint arXiv: 1301. 3781. 3, 7
-
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. ArXiv preprint arXiv: 1301. 3781, 2013. 3, 7
-
(2013)
Efficient Estimation of Word Representations in Vector Space
-
-
Mikolov, T.1
Chen, K.2
Corrado, G.3
Dean, J.4
-
26
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, andW. J. Zhu. BLEU: A method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002. 6
-
(2002)
ACL
, vol.6
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, A.J.4
-
28
-
-
84906510695
-
Linking people in videos with their. Names using coreference resolution
-
2
-
V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking People in Videos with "Their" Names Using Coreference Resolution. In ECCV, pages 95-110. 2014. 2
-
(2014)
ECCV
, pp. 95-110
-
-
Ramanathan, V.1
Joulin, A.2
Liang, P.3
Fei-Fei, L.4
-
29
-
-
84898775557
-
Video event understanding using natural language descriptions
-
V. Ramanathan, P. Liang, and L. Fei-Fei. Video event understanding using natural language descriptions. In ICCV, 2013. 1
-
(2013)
ICCV
, vol.1
-
-
Ramanathan, V.1
Liang, P.2
Fei-Fei, L.3
-
30
-
-
84952349302
-
A dataset for movie description
-
A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele. A dataset for movie description. In CVPR, 2015. 2, 5
-
(2015)
CVPR
, vol.2
, pp. 5
-
-
Rohrbach, A.1
Rohrbach, M.2
Tandon, N.3
Schiele, B.4
-
31
-
-
84898875082
-
Subtitle-free Movie to Script Alignment
-
P. Sankar, C. V. Jawahar, and A. Zisserman. Subtitle-free Movie to Script Alignment. In BMVC, 2009. 2
-
(2009)
BMVC
, vol.2
-
-
Sankar, P.1
Jawahar, C.V.2
Zisserman, A.3
-
32
-
-
84867113207
-
Efficient structured prediction with latent variables for general graphical models
-
A. Schwing, T. Hazan, M. Pollefeys, and R. Urtasun. Efficient Structured Prediction with Latent Variables for General Graphical Models. In ICML, 2012. 7
-
(2012)
ICML
, vol.7
-
-
Schwing, A.1
Hazan, T.2
Pollefeys, M.3
Urtasun, R.4
-
33
-
-
70450202706
-
Who are you"-Learning person specific classifiers from video
-
2
-
J. Sivic, M. Everingham, and A. Zisserman. "Who are you"-Learning person specific classifiers from video. CVPR, pages 1145-1152, 2009. 2
-
(2009)
CVPR
, pp. 1145-1152
-
-
Sivic, J.1
Everingham, M.2
Zisserman, A.3
-
34
-
-
84964474107
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. ACL, 2: 207-218, 2014. 2
-
(2014)
ACL
, vol.2
, pp. 207-218
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
35
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014. 4
-
(2014)
NIPS
, vol.4
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
36
-
-
84964983441
-
-
arXiv preprint arXiv: 1409. 4842. 5
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. ArXiv preprint arXiv: 1409. 4842, 2014. 5
-
(2014)
Going Deeper with Convolutions
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
37
-
-
84959255361
-
Book2Movie: Aligning Video scenes with Book chapters
-
M. Tapaswi, M. Bauml, and R. Stiefelhagen. Book2Movie: Aligning Video scenes with Book chapters. In CVPR, 2015. 2
-
(2015)
CVPR
, vol.2
-
-
Tapaswi, M.1
Bauml, M.2
Stiefelhagen, R.3
-
38
-
-
84977834021
-
Aligning plot synopses to videos for story-based retrieval
-
1, 2, 6
-
M. Tapaswi, M. Buml, and R. Stiefelhagen. Aligning Plot Synopses to Videos for Story-based Retrieval. IJMIR, 4: 3-16, 2015. 1, 2, 6
-
(2015)
IJMIR
, vol.4
, pp. 3-16
-
-
Tapaswi, M.1
Buml, M.2
Stiefelhagen, R.3
-
39
-
-
84944069490
-
-
CoRR abs/ 1312. 6229, cs. CV. 1, 2
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating Videos to Natural Language Using Deep Recurrent Neural Networks. CoRR abs/1312. 6229, cs. CV, 2014. 1, 2
-
(2014)
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.J.5
Saenko, K.6
-
40
-
-
84939821075
-
-
arXiv: 1411. 4555. 1, 2
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In arXiv: 1411. 4555, 2014. 1, 2
-
(2014)
Show and Tell: A Neural Image Caption Generator
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
41
-
-
84939821074
-
-
arXiv: 1502. 03044. 2
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In arXiv: 1502. 03044, 2015. 2
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
-
42
-
-
85015194053
-
Learning deep features for scene recognition using places database
-
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning Deep Features for Scene Recognition using Places Database. In NIPS, 2014. 5, 8
-
(2014)
NIPS
, vol.5
, pp. 8
-
-
Zhou, B.1
Lapedriza, A.2
Xiao, J.3
Torralba, A.4
Oliva, A.5
|