-
1
-
-
84870183903
-
3D convolutional neural networks for human action recognition
-
Jan
-
S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 221-231, Jan. 2013
-
(2013)
IEEE Trans. Pattern Anal. Mach. Intell
, vol.35
, Issue.1
, pp. 221-231
-
-
Ji, S.1
Xu, W.2
Yang, M.3
Yu, K.4
-
2
-
-
81855221241
-
Sequential deep learning for human action recognition
-
M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, "Sequential deep learning for human action recognition," in Proc. 2nd Int. Conf. Human Behavior Understanding, 2011, pp. 29-39
-
(2011)
Proc. 2nd Int. Conf. Human Behavior Understanding
, pp. 29-39
-
-
Baccouche, M.1
Mamalet, F.2
Wolf, C.3
Garcia, C.4
Baskurt, A.5
-
3
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 1725-1732
-
(2014)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 1725-1732
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
5
-
-
0003465475
-
Learning internal representations by error propagation
-
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," DTIC Document, Fort Belvoir, VA, USA, Tech. Rep. ICS 8506, 1985
-
(1985)
DTIC Document, Fort Belvoir, VA, USA, Tech. Rep. ICS
, pp. 8506
-
-
Rumelhart, D.E.1
Hinton, G.E.2
Williams, R.J.3
-
6
-
-
0001202594
-
A learning algorithm for continually running fully recurrent neural networks
-
Cambridge, MA, USA: MIT Press
-
R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," in Neural Computation. Cambridge, MA, USA: MIT Press, 1989
-
(1989)
Neural Computation
-
-
Williams, R.J.1
Zipser, D.2
-
7
-
-
0031573117
-
Long short-term memory
-
Cambridge, MA, USA: MIT Press
-
S. Hochreiter and J. Schmidhuber, "Long short-term memory, " in, Neural Computation. Cambridge, MA, USA: MIT Press, 1997
-
(1997)
Neural Computation
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
8
-
-
84919832465
-
Towards end-to-end speech recognition with recurrent neural networks
-
A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 1764-1772
-
(2014)
Proc. 31st Int. Conf. Mach. Learn
, pp. 1764-1772
-
-
Graves, A.1
Jaitly, N.2
-
9
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Proc. Advances Neural Inf. Process. Syst., 2014, pp. 3104-3112
-
(2014)
Proc. Advances Neural Inf. Process. Syst
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
10
-
-
85097641926
-
On the properties of neural machine translation: Encoder-decoder approaches
-
K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, "On the properties of neural machine translation: Encoder-decoder approaches," in Proc. 8th Workshop Syntax Semantics Struct. Statistical Transl., 2014, pp. 103-111
-
(2014)
Proc. 8th Workshop Syntax Semantics Struct. Statistical Transl
, pp. 103-111
-
-
Cho, K.1
Van Merrienboer, B.2
Bahdanau, D.3
Bengio, Y.4
-
11
-
-
84898775239
-
Translating video content to natural language descriptions
-
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele, "Translating video content to natural language descriptions," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 433-440
-
(2013)
Proc. IEEE Int. Conf. Comput. Vis
, pp. 433-440
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
12
-
-
84913580146
-
Caffe: Convolutional architecture for fast feature embedding
-
Y. Jia "Caffe: Convolutional architecture for fast feature embedding," in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675-678
-
(2014)
Proc. 22nd ACM Int. Conf. Multimedia
, pp. 675-678
-
-
Jia, Y.1
-
13
-
-
84959894695
-
Learning to execute
-
abs/1410.4615
-
W. Zaremba and I. Sutskever, "Learning to execute," CoRR, vol. abs/1410.4615, 2014, http://arxiv.org/abs/1410.4615
-
(2014)
CoRR
-
-
Zaremba, W.1
Sutskever, I.2
-
14
-
-
84953873103
-
Generating sequences with recurrent neural networks
-
abs/1308.0850
-
A. Graves, "Generating sequences with recurrent neural networks," CoRR, vol. abs/1308.0850, 2013, http://arxiv.org/abs/1308.0850
-
(2013)
CoRR
-
-
Graves, A.1
-
15
-
-
84867626068
-
Revisiting recurrent neural networks for robust ASR
-
O. Vinyals, S. V. Ravuri, and D. Povey, "Revisiting recurrent neural networks for robust ASR," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2012, pp. 4085-4088
-
(2012)
Proc. IEEE Int. Conf. Acoust. Speech Signal Process
, pp. 4085-4088
-
-
Vinyals, O.1
Ravuri, S.V.2
Povey, D.3
-
16
-
-
80053459857
-
Generating text with recurrent neural networks
-
I. Sutskever, J. Martens, and G. E. Hinton, "Generating text with recurrent neural networks," in Proc. 28th Int. Conf. Mach. Learn., 2011, pp. 1017-1024
-
(2011)
Proc. 28th Int. Conf. Mach. Learn
, pp. 1017-1024
-
-
Sutskever, I.1
Martens, J.2
Hinton, G.E.3
-
17
-
-
84876231242
-
ImageNet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. Advances Neural Inf. Process. Syst., 2012, pp. 1106-1114
-
(2012)
Proc. Advances Neural Inf. Process. Syst
, pp. 1106-1114
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
20
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," in Proc. Conf. Empirical Methods Natural Language Process., 2014, pp. 1724-1734
-
(2014)
Proc. Conf. Empirical Methods Natural Language Process
, pp. 1724-1734
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bougares, F.4
Schwenk, H.5
Bengio, Y.6
-
21
-
-
35048833329
-
High accuracy optical flow estimation based on a theory for warping
-
T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, "High accuracy optical flow estimation based on a theory for warping," in Proc. Eur. Conf. Comput. Vis., 2004, pp. 25-36
-
(2004)
Proc. Eur. Conf. Comput. Vis
, pp. 25-36
-
-
Brox, T.1
Bruhn, A.2
Papenberg, N.3
Weickert, J.4
-
22
-
-
84906489074
-
Visualizing and understanding convolutional networks
-
M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818-833
-
(2014)
Proc. Eur. Conf. Comput. Vis
, pp. 818-833
-
-
Zeiler, M.D.1
Fergus, R.2
-
23
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
O. Russakovsky, et al., "ImageNet Large Scale Visual Recognition Challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015
-
(2015)
Int. J. Comput. Vis.
, vol.115
, Issue.3
, pp. 211-252
-
-
Russakovsky, O.1
-
24
-
-
85198028989
-
ImageNet: A large-scale hierarchical image database
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 248-255
-
(2009)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 248-255
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
25
-
-
84884955228
-
UCF101: A dataset of 101 human actions classes from videos in the wild
-
K. Soomro, A. R. Zamir, and M. Shah, "UCF101: A dataset of 101 human actions classes from videos in the wild," Univ. Central Florida, Orlando, FL, USA, Tech. Rep. CRCV-TR-12-01, 2012
-
(2012)
Univ. Central Florida, Orlando, FL, USA, Tech. Rep. CRCV-TR-12-01
-
-
Soomro, K.1
Zamir, A.R.2
Shah, M.3
-
26
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
P. Y. Micah Hodosh and, J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics," in J. Artificial Intell. Res., vol. 47, no. 1, pp. 853-899, 2013
-
(2013)
J. Artificial Intell. Res.
, vol.47
, Issue.1
, pp. 853-899
-
-
Micah Hodosh, P.Y.1
Hockenmaier, J.2
-
27
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-RNN)
-
San Diego, CA, USA
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille, "Deep captioning with multimodal recurrent neural networks (m-RNN)," presented at the Int. Conf. Learn. Representations, San Diego, CA, USA, 2015
-
(2015)
The Int. Conf. Learn. Representations
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.5
-
28
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
A. Karpathy, A. Joulin, and L. Fei-Fei, "Deep fragment embeddings for bidirectional image sentence mapping," in Proc. Advances Neural Inf. Process. Syst., 2014, pp. 1889-1897
-
(2014)
Proc. Advances Neural Inf. Process. Syst
, pp. 1889-1897
-
-
Karpathy, A.1
Joulin, A.2
Fei-Fei, L.3
-
29
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 207-218, 2014
-
(2014)
Trans. Assoc. Comput. Linguistics
, vol.2
, pp. 207-218
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
30
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
A. Frome, et al., "Devise: A deep visual-semantic embedding model," in Advances Neural Inf. Process. Syst., 2013, pp. 2121-2129
-
(2013)
Advances Neural Inf. Process. Syst
, pp. 2121-2129
-
-
Frome, A.1
-
31
-
-
84946802533
-
Unifying visualsemantic embeddings with multimodal neural language models
-
abs/1411.2539
-
R. Kiros, R. Salakhuditnov, and R. S. Zemel, "Unifying visualsemantic embeddings with multimodal neural language models," CoRR, vol. abs/1411.2539, 2014, http://arxiv.org/abs/1411.2539
-
(2014)
CoRR
-
-
Kiros, R.1
Salakhuditnov, R.2
Zemel, R.S.3
-
32
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
M. H. Peter Young, A. Lai, and J. Hockenmaier, "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 67-68, 2014
-
(2014)
Trans. Assoc. Comput. Linguistics
, vol.2
, pp. 67-68
-
-
Peter Young, M.H.1
Lai, A.2
Hockenmaier, J.3
-
33
-
-
84906493406
-
Microsoft COCO: Common objects in context
-
Springer
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft COCO: Common objects in context," Eur. Conf. Comput. Vis., Springer, pp. 740-755, 2014
-
(2014)
Eur. Conf. Comput. Vis
, pp. 740-755
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollar, P.7
Zitnick, C.L.8
-
34
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "BLEU: A method for automatic evaluation of machine translation," in Proc. 40th Annu. Meet. Assoc. Comput. Linguistics, 2002, pp. 311-318
-
(2002)
Proc. 40th Annu. Meet. Assoc. Comput. Linguistics
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
35
-
-
84956980995
-
CIDEr: Consensusbased image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh, "CIDEr: Consensusbased image description evaluation," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 4566-4575
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 4566-4575
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
38
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3156-3164
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
39
-
-
84944096380
-
Language models for image captioning: The quirks and what works
-
J. Devlin, et al., "Language models for image captioning: The quirks and what works," in Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics, 2015, pp. 100-105
-
(2015)
Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics
, pp. 100-105
-
-
Devlin, J.1
-
40
-
-
84973863256
-
Learning like a child: Fast novel visual concept learning from sentence descriptions of images
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, "Learning like a child: Fast novel visual concept learning from sentence descriptions of images," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2533-2541
-
(2015)
Proc. IEEE Int. Conf. Comput. Vis
, pp. 2533-2541
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
41
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, et al., "From captions to visual concepts and back," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 1473-1482
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 1473-1482
-
-
Fang, H.1
-
42
-
-
84986257720
-
Exploring nearest neighbor approaches for image captioning
-
abs/1505.04467
-
J. Devlin, S. Gupta, R. B. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring nearest neighbor approaches for image captioning," CoRR, vol. abs/1505.04467, 2015, http://arxiv.org/abs/1505.04467
-
(2015)
CoRR
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.B.3
Mitchell, M.4
Zitnick, C.L.5
-
43
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, et al., "Long-term recurrent convolutional networks for visual recognition and description," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2625-2634
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 2625-2634
-
-
Donahue, J.1
-
44
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Lille, France
-
K. Xu, et al., "Show, attend and tell: Neural image caption generation with visual attention," presented at the 32nd Int. Conf. Mach. Learn., Lille, France, 2015
-
(2015)
The 32nd Int. Conf. Mach. Learn
-
-
Xu, K.1
-
46
-
-
84934873221
-
TreeTalk: Composition and compression of trees for image descriptions
-
P. Kuznetsova, V. Ordonez, T. L. Berg, U. C. Hill, and Y. Choi, "TreeTalk: Composition and compression of trees for image descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, no. 10, pp. 351-362, 2014
-
(2014)
Trans. Assoc. Comput. Linguistics
, vol.2
, Issue.10
, pp. 351-362
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.L.3
Hill, U.C.4
Choi, Y.5
-
48
-
-
84960170289
-
Coherent multi-sentence video description with variable level of detail
-
Berlin, Germany: Springer
-
A. Rohrbach, M. Rohrbach, W. Qiu, A. Friedrich, M. Pinkal, and B. Schiele, "Coherent multi-sentence video description with variable level of detail," in German Conf. Pattern Recog. (GCPR). Berlin, Germany: Springer, 2014
-
(2014)
German Conf. Pattern Recog. (GCPR)
-
-
Rohrbach, A.1
Rohrbach, M.2
Qiu, W.3
Friedrich, A.4
Pinkal, M.5
Schiele, B.6
-
49
-
-
84876945537
-
Dense trajectories and motion boundary descriptors for action recognition
-
H. Wang, A. Klaser, C. Schmid, and C. Liu, "Dense trajectories and motion boundary descriptors for action recognition," Int. J. Comput. Vis., vol. 103, pp. 60-79, 2013
-
(2013)
Int. J. Comput. Vis.
, vol.103
, pp. 60-79
-
-
Wang, H.1
Klaser, A.2
Schmid, C.3
Liu, C.4
-
50
-
-
84898805910
-
Action recognition with improved trajectories
-
H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 3551-3558
-
(2013)
Proc. IEEE Int. Conf. Comput. Vis
, pp. 3551-3558
-
-
Wang, H.1
Schmid, C.2
-
51
-
-
84856682691
-
HMDB: A large video database for human motion recognition
-
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: A large video database for human motion recognition," in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2556-2563
-
(2011)
Proc. IEEE Int. Conf. Comput. Vis
, pp. 2556-2563
-
-
Kuehne, H.1
Jhuang, H.2
Garrote, E.3
Poggio, T.4
Serre, T.5
-
52
-
-
78049380429
-
Action classification in soccer videos with long short-term memory recurrent neural networks
-
M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, "Action classification in soccer videos with long short-term memory recurrent neural networks," in Proc. 20th Int. Conf. Artificial Neural Netw., 2010, pp. 154-159
-
(2010)
Proc. 20th Int. Conf. Artificial Neural Netw
, pp. 154-159
-
-
Baccouche, M.1
Mamalet, F.2
Wolf, C.3
Garcia, C.4
Baskurt, A.5
-
53
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, et al., "Every picture tells a story: Generating sentences from images," in Proc. 11th Eur. Conf. Comput. Vis., 2010, pp. 15-29
-
(2010)
Proc. 11th Eur. Conf. Comput. Vis
, pp. 15-29
-
-
Farhadi, A.1
-
54
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
G. Kulkarni, et al., "Baby talk: Understanding and generating simple image descriptions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1601-1608
-
(2011)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 1601-1608
-
-
Kulkarni, G.1
-
55
-
-
80053258778
-
Corpusguided sentence generation of natural images
-
Y. Yang, C. L. Teo, H. Daume III, and Y. Aloimonos, "Corpusguided sentence generation of natural images," in Proc. Conf. Empirical Methods Natural Language Process., 2011, pp. 444-454
-
(2011)
Proc. Conf. Empirical Methods Natural Language Process
, pp. 444-454
-
-
Yang, Y.1
Teo, C.L.2
Daume, H.3
Aloimonos, Y.4
-
57
-
-
84878189119
-
Collective generation of natural image descriptions
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi, "Collective generation of natural image descriptions," in Proc. 50th Annu. Meeting Assoc. Comput. Linguistics: Long Papers, 2012, pp. 359-368
-
(2012)
Proc. 50th Annu. Meeting Assoc. Comput. Linguistics: Long Papers
, pp. 359-368
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
58
-
-
84919921461
-
Multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. Zemel, "Multimodal neural language models," in Proc. 31st Int. Conf. Mach. Learn., 2014, 595-603
-
(2014)
Proc. 31st Int. Conf. Mach. Learn
, pp. 595-603
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.3
-
59
-
-
84898773262
-
YouTube2Text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
-
S. Guadarrama, et al., "YouTube2Text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2712-2719
-
(2013)
Proc. IEEE Int. Conf. Comput. Vis
, pp. 2712-2719
-
-
Guadarrama, S.1
-
60
-
-
84863029475
-
Human focused video description
-
M. U. G. Khan, L. Zhang, and Y. Gotoh, "Human focused video description," in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2011, pp. 1480-1487
-
(2011)
Proc. IEEE Int. Conf. Comput. Vis. Workshops
, pp. 1480-1487
-
-
Khan, M.U.G.1
Zhang, L.2
Gotoh, Y.3
-
62
-
-
84887345951
-
Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
-
P. Das, C. Xu, R. Doell, and J. Corso, "Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2013, pp. 2634-2641
-
(2013)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 2634-2641
-
-
Das, P.1
Xu, C.2
Doell, R.3
Corso, J.4
-
63
-
-
84455192418
-
Towards textually describing complex video contents with audio-visual concept classifiers
-
C. C. Tan, Y.-G. Jiang, and C.-W. Ngo, "Towards textually describing complex video contents with audio-visual concept classifiers," in Proc. 19th ACM Int. Conf. Multimedia, 2011, pp. 655-658
-
(2011)
Proc. 19th ACM Int. Conf. Multimedia
, pp. 655-658
-
-
Tan, C.C.1
Jiang, Y.-G.2
Ngo, C.-W.3
-
64
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. J. Mooney, "Integrating language and vision to generate natural language descriptions of videos in the wild," in Proc. 25th Int. Conf. Comput. Linguistics, 2014, pp. 1218-1227
-
(2014)
Proc. 25th Int. Conf. Comput. Linguistics
, pp. 1218-1227
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.J.5
-
65
-
-
84910072094
-
Sequence discriminative distributed training of long short-term memory recurrent neural networks
-
Singapore
-
H. Sak, et al., "Sequence discriminative distributed training of long short-term memory recurrent neural networks," presented at the 15th Annu. Conf. Int. Speech Commun. Assoc., Singapore, 2014
-
(2014)
The 15th Annu. Conf. Int. Speech Commun. Assoc
-
-
Sak, H.1
-
66
-
-
84959228762
-
Beyond short snippets: Deep networks for video classification
-
J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, "Beyond short snippets: Deep networks for video classification," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 4694-4702
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 4694-4702
-
-
Ng, J.Y.-H.1
Hausknecht, M.2
Vijayanarasimhan, S.3
Vinyals, O.4
Monga, R.5
Toderici, G.6
-
67
-
-
84977668095
-
Every moment counts: Dense detailed labeling of actions in complex videos
-
abs/1507.05738
-
S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and F.-F. Li, "Every moment counts: Dense detailed labeling of actions in complex videos," CoRR, vol. abs/1507.05738, 2015, http://arxiv. org/abs/1507.05738
-
(2015)
CoRR
-
-
Yeung, S.1
Russakovsky, O.2
Jin, N.3
Andriluka, M.4
Mori, G.5
Li, F.-F.6
-
68
-
-
84986274522
-
Deep compositional captioning: Describing novel object categories without paired training data
-
L. A. Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell, "Deep compositional captioning: Describing novel object categories without paired training data," Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
-
-
Hendricks, L.A.1
Venugopalan, S.2
Rohrbach, M.3
Mooney, R.4
Saenko, K.5
Darrell, T.6
-
69
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan,H. Xu, J.Donahue,M. Rohrbach, R.Mooney, and K. Saenko, "Translating videos to natural language using deep recurrent neural networks," in Proc. North Amer. Chapter Assoc. Comput. Linguistics-Human Language Technol., 2015, pp. 1494-1504
-
(2015)
Proc. North Amer. Chapter Assoc. Comput. Linguistics-Human Language Technol
, pp. 1494-1504
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
70
-
-
84973882730
-
Sequence to sequence-video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko, "Sequence to sequence-video to text," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4534-4542
-
(2015)
Proc. IEEE Int. Conf. Comput. Vis
, pp. 4534-4542
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
71
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
L. Yao, et al., "Describing videos by exploiting temporal structure," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2015, pp. 4507-4515
-
(2015)
Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog
, pp. 4507-4515
-
-
Yao, L.1
-
72
-
-
85027437052
-
Grounding of textual phrases in images by reconstruction
-
abs/1511.03745
-
A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele, "Grounding of textual phrases in images by reconstruction," CoRR, vol. abs/1511.03745, 2015, http://arxiv.org/abs/1511.03745
-
(2015)
CoRR
-
-
Rohrbach, A.1
Rohrbach, M.2
Hu, R.3
Darrell, T.4
Schiele, B.5
-
73
-
-
84986305787
-
Natural language object retrieval
-
R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell, "Natural language object retrieval," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 4555-4564.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog
, pp. 4555-4564
-
-
Hu, R.1
Xu, H.2
Rohrbach, M.3
Feng, J.4
Saenko, K.5
Darrell, T.6
|