-
1
-
-
84991618952
-
Optimized graph learning using partial tags and multiple features for image and video annotation
-
Nov.
-
J. Song et al., "Optimized graph learning using partial tags and multiple features for image and video annotation, " IEEE Trans. Image Process., vol. 25, no. 11, pp. 4999-5011, Nov. 2016.
-
(2016)
IEEE Trans. Image Process.
, vol.25
, Issue.11
, pp. 4999-5011
-
-
Song, J.1
-
2
-
-
84897584700
-
Video-to-shot tag propagation by graph sparse group lasso
-
Apr.
-
X. Zhu, Z. Huang, J. Cui, and H. T. Shen, "Video-to-shot tag propagation by graph sparse group lasso, " IEEE Trans. Multimedia, vol. 15, no. 3, pp. 633-646, Apr. 2013.
-
(2013)
IEEE Trans. Multimedia
, vol.15
, Issue.3
, pp. 633-646
-
-
Zhu, X.1
Huang, Z.2
Cui, J.3
Shen, H.T.4
-
3
-
-
85027941917
-
Efficient motion and disparity estimation optimization for low complexity multiview video coding
-
Jun.
-
Z. Pan, Y. Zhang, and S. Kwong, "Efficient motion and disparity estimation optimization for low complexity multiview video coding, " IEEE Trans. Broadcast., vol. 61, no. 2, pp. 166-176, Jun. 2015.
-
(2015)
IEEE Trans. Broadcast.
, vol.61
, Issue.2
, pp. 166-176
-
-
Pan, Z.1
Zhang, Y.2
Kwong, S.3
-
4
-
-
84959874994
-
Effective approachestoattention-based neural machine translation
-
T. Luong, H. Pham, and C.D. Manning, "Effective approachestoattention-based neural machine translation, " in Proc. Empirical Methods Natural Lang. Process., 2015, pp. 1412-1421.
-
(2015)
Proc. Empirical Methods Natural Lang. Process.
, pp. 1412-1421
-
-
Luong, T.1
Pham, H.2
Manning, C.D.3
-
5
-
-
84887334105
-
Inductive hashing on manifolds
-
F. Shen, C. Shen, Q. Shi, A. Van Den Hengel, and Z. Tang, "Inductive hashing on manifolds, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2013, pp. 1562-1569.
-
(2013)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 1562-1569
-
-
Shen, F.1
Shen, C.2
Shi, Q.3
Van Den Hengel, A.4
Tang, Z.5
-
6
-
-
85029549046
-
Quantization-based hashing: A general framework for scalable image and video retrieval
-
J. Song, L. Gao, L. Liu, X. Zhu, and N. Sebe, "Quantization-based hashing: A general framework for scalable image and video retrieval, " Pattern Recog., 2017.
-
(2017)
Pattern Recog.
-
-
Song, J.1
Gao, L.2
Liu, L.3
Zhu, X.4
Sebe, N.5
-
7
-
-
84986296735
-
You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images
-
C. Gan, T. Yao, K. Yang, Y. Yang, and T. Mei, "You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 923-932.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 923-932
-
-
Gan, C.1
Yao, T.2
Yang, K.3
Yang, Y.4
Mei, T.5
-
8
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2015, pp. 3156-3164.
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
9
-
-
84994560125
-
Attention-based LSTM with semantic consistency for videos captioning
-
Z. Guo et al, "Attention-based LSTM with semantic consistency for videos captioning, " in Proc. ACM Multimedia Conf., 2016, pp. 357-361.
-
(2016)
Proc. ACM Multimedia Conf.
, pp. 357-361
-
-
Guo, Z.1
-
11
-
-
84986296808
-
Rethinking the inception architecture for computer vision
-
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 2818-2826.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 2818-2826
-
-
Szegedy, C.1
Vanhoucke, V.2
Ioffe, S.3
Shlens, J.4
Wojna, Z.5
-
12
-
-
85083950512
-
-
ICLR
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, "Deep captioning with multimodal recurrent neural networks (m-RNN), " ICLR, 2015.
-
(2015)
Deep Captioning with Multimodal Recurrent Neural Networks (M-RNN)
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
13
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu et al., "Show, attend and tell: Neural image caption generation with visual attention, " in Proc. Int. Conf. Mach. Learn., 2015, pp. 2048-2057.
-
(2015)
Proc. Int. Conf. Mach. Learn.
, pp. 2048-2057
-
-
Xu, K.1
-
14
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
A. Karpathy, A. Joulin, and F. F. F. Li, "Deep fragment embeddings for bidirectional image sentence mapping, " in Proc. Int. Conf. Neural Inform. Process. Syst, 2014, pp. 1889-1897.
-
(2014)
Proc. Int. Conf. Neural Inform. Process. Syst
, pp. 1889-1897
-
-
Karpathy, A.1
Joulin, A.2
Li, F.F.F.3
-
15
-
-
84973917813
-
Guiding the long-short term memory model for image caption generation
-
X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars, "Guiding the long-short term memory model for image caption generation, " in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 2407-2415.
-
(2015)
Proc. IEEE Int. Conf. Comput. Vis., Dec.
, pp. 2407-2415
-
-
Jia, X.1
Gavves, E.2
Fernando, B.3
Tuytelaars, T.4
-
16
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
Denver, Colorado, USA, May 31-Jun. 5
-
S. Venugopalan et al, "Translating videos to natural language using deep recurrent neural networks, " in Proc. Conf. North Amer. Chapter Assoc, Comput. Linguistics, Human Lang. Technol, Denver, Colorado, USA, May 31-Jun. 5, 2015, pp. 1494-1504.
-
(2015)
Proc. Conf. North Amer. Chapter Assoc, Comput. Linguistics, Human Lang. Technol
, pp. 1494-1504
-
-
Venugopalan, S.1
-
18
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
L. Yao et al, "Describing videos by exploiting temporal structure, " in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 4507-4515.
-
(2015)
Proc. IEEE Int. Conf. Comput. Vis., Dec.
, pp. 4507-4515
-
-
Yao, L.1
-
19
-
-
84962850062
-
Summarization-based video caption via deep neural networks
-
G. Li, S. Ma, and Y Han, "Summarization-based video caption via deep neural networks, " in Proc. ACM Multimedia Conf, 2015, pp. 1191-1194.
-
(2015)
Proc. ACM Multimedia Conf
, pp. 1191-1194
-
-
Li, G.1
Ma, S.2
Han, Y.3
-
20
-
-
84986285188
-
-
CoRR [Online]
-
Q. Wu, C. Shen, A. v. d. Hengel, L. Liu, and A. Dick, "Image captioning with an intermediate attributes layer, " CoRR, 2015. [Online]. Available: http://arxiv.org/abs/1506.01144
-
(2015)
Image Captioning with An Intermediate Attributes Layer
-
-
Wu, Q.1
Shen, C.2
Hengel, A.V.D.3
Liu, L.4
Dick, A.5
-
21
-
-
0031573117
-
Long short-term memory
-
S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Comput.
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
22
-
-
84986334021
-
Stacked attention networks for image question answering
-
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola, "Stacked attention networks for image question answering, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 21-29.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 21-29
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
23
-
-
84994636856
-
Graph-without-cut: An ideal graph learning for image segmentation
-
L. Gao et al, "Graph-without-cut: An ideal graph learning for image segmentation, " in Proc. AAAI, 2016, pp. 1188-1194.
-
(2016)
Proc. AAAI
, pp. 1188-1194
-
-
Gao, L.1
-
24
-
-
84994613586
-
Joint graph learning and video segmentation via multiple cues and topology calibration
-
J. Song et al., "Joint graph learning and video segmentation via multiple cues and topology calibration, " in Proc. ACM Multimedia Conf, 2016, pp. 831-840.
-
(2016)
Proc. ACM Multimedia Conf
, pp. 831-840
-
-
Song, J.1
-
25
-
-
84888343222
-
Effective multiple feature hashing for large-scale near-duplicate video retrieval
-
Dec.
-
J. Song, Y Yang, Z. Huang, H. T. Shen, and J. Luo, "Effective multiple feature hashing for large-scale near-duplicate video retrieval, " IEEE Trans. Multimedia, vol. 15, no. 8, pp. 1997-2008, Dec. 2013.
-
(2013)
IEEE Trans. Multimedia
, vol.15
, Issue.8
, pp. 1997-2008
-
-
Song, J.1
Yang, Y.2
Huang, Z.3
Shen, H.T.4
Luo, J.5
-
26
-
-
84973863239
-
Human action recognition using factorized spatio-temporal convolutional networks
-
L. Sun, K. Jia, D.-Y Yeung, and B. E. Shi, "Human action recognition using factorized spatio-temporal convolutional networks, " in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 4597-4605.
-
(2015)
Proc. IEEE Int. Conf. Comput. Vis., Dec.
, pp. 4597-4605
-
-
Sun, L.1
Jia, K.2
Yeung, D.-Y.3
Shi, B.E.4
-
27
-
-
84973865953
-
Learning spatiotemporal features with 3d convolutional networks
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning spatiotemporal features with 3d convolutional networks, " in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 4489-4497.
-
(2015)
Proc. IEEE Int. Conf. Comput. Vis., Dec.
, pp. 4489-4497
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
28
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue et al., "Long-term recurrent convolutional networks for visual recognition and description, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2015, pp. 2625-2634.
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 2625-2634
-
-
Donahue, J.1
-
29
-
-
84926321124
-
Joint language and translation modeling with recurrent neural networks
-
M. Auli, M. Galley, C. Quirk, and G. Zweig, "Joint language and translation modeling with recurrent neural networks, " in Proc. Conf. Empirical Methods Natural Lang. Process., 2013, pp. 1044-1054.
-
(2013)
Proc. Conf. Empirical Methods Natural Lang. Process.
, pp. 1044-1054
-
-
Auli, M.1
Galley, M.2
Quirk, C.3
Zweig, G.4
-
35
-
-
84952308628
-
The long-short story of movie description
-
A. Rohrbach, M. Rohrbach, and B. Schiele, "The long-short story of movie description, " in Proc. German Conf. Pattern Recog., 2015, pp. 209-221.
-
(2015)
Proc. German Conf. Pattern Recog.
, pp. 209-221
-
-
Rohrbach, A.1
Rohrbach, M.2
Schiele, B.3
-
37
-
-
85029372390
-
-
CVPR
-
Y. Pan, T. Yao, H. Li, and T. Mei, "Video captioning with transferred semantic attributes, " CVPR, 2017.
-
(2017)
Video Captioning with Transferred Semantic Attributes
-
-
Pan, Y.1
Yao, T.2
Li, H.3
Mei, T.4
-
39
-
-
85012903188
-
Cross-heterogeneous-database age estimation through correlation representation learning
-
Q. Tian and S. Chen, "Cross-heterogeneous-database age estimation through correlation representation learning, " Neurocomputing, vol. 238, pp. 286-295, 2017.
-
(2017)
Neurocomputing
, vol.238
, pp. 286-295
-
-
Tian, Q.1
Chen, S.2
-
40
-
-
84986332702
-
Jointly modeling embedding and translation to bridge video and language
-
Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui, "Jointly modeling embedding and translation to bridge video and language, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 4594-4602.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 4594-4602
-
-
Pan, Y.1
Mei, T.2
Yao, T.3
Li, H.4
Rui, Y.5
-
41
-
-
84986290372
-
Hierarchical recurrent neural encoder for video representation with application to captioning
-
P. Pan, Z. Xu, Y. Yang, F. Wu, and Y. Zhuang, "Hierarchical recurrent neural encoder for video representation with application to captioning, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 1029-1038.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 1029-1038
-
-
Pan, P.1
Xu, Z.2
Yang, Y.3
Wu, F.4
Zhuang, Y.5
-
42
-
-
84986317307
-
Image captioning with semantic attention
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, "Image captioning with semantic attention, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 4651-4659.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 4651-4659
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
43
-
-
84990990335
-
Fast reference frame selection based on content similarity for low complexity HEVC encoder
-
Z. Pan, P. Jin, J. Lei, Y. Zhang, X. Sun, and S. Kwong, "Fast reference frame selection based on content similarity for low complexity HEVC encoder, " J. Vis. Commun. Image Represent., vol. 40, pp. 516-524, 2016.
-
(2016)
J. Vis. Commun. Image Represent.
, vol.40
, pp. 516-524
-
-
Pan, Z.1
Jin, P.2
Lei, J.3
Zhang, Y.4
Sun, X.5
Kwong, S.6
-
44
-
-
0028392483
-
Learning long-term dependencies with gradient descent is difficult
-
Mar.
-
Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult, " IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 157-166, Mar. 1994.
-
(1994)
IEEE Trans. Neural Netw.
, vol.5
, Issue.2
, pp. 157-166
-
-
Bengio, Y.1
Simard, P.2
Frasconi, P.3
-
46
-
-
84986260127
-
MSR-VTT: A large video description dataset for bridging video and language
-
J. Xu, T. Mei, T. Yao, and Y. Rui, "MSR-VTT: A large video description dataset for bridging video and language, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 5288-5296.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 5288-5296
-
-
Xu, J.1
Mei, T.2
Yao, T.3
Rui, Y.4
-
48
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "BLEU: A method for automatic evaluation of machine translation, " in Proc. 40th Annu. Meet. Assoc. Comput. Linguistics, 2002, pp. 311-318.
-
(2002)
Proc. 40th Annu. Meet. Assoc. Comput. Linguistics
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
50
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
R. Vedantam, C. Lawrence Zitnick, and D. Parikh, "Cider: Consensus-based image description evaluation, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2015, pp. 4566-4575.
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 4566-4575
-
-
Vedantam, R.1
Lawrence Zitnick, C.2
Parikh, D.3
-
51
-
-
84940762015
-
Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
-
R. Xu, C. Xiong, W. Chen, and J. J. Corso, "Jointly modeling deep video and compositional text to bridge vision and language in a unified framework, " in Proc. Assoc. Adv. Artif. Intell., 2015, pp. 2346-2352.
-
(2015)
Proc. Assoc. Adv. Artif. Intell.
, pp. 2346-2352
-
-
Xu, R.1
Xiong, C.2
Chen, W.3
Corso, J.J.4
-
52
-
-
84986275061
-
Video paragraph cap-tioning using hierarchical recurrent neural networks
-
H. Yu, J. Wang, Z. Huang, Y. Yang, and W. Xu, "Video paragraph cap-tioning using hierarchical recurrent neural networks, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 4584-4593.
-
(2016)
Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun.
, pp. 4584-4593
-
-
Yu, H.1
Wang, J.2
Huang, Z.3
Yang, Y.4
Xu, W.5
|