SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 1029-1038

Hierarchical recurrent neural encoder for video representation with application to captioning

(5) Pan, Pingbo a Xu, Zhongwen b Yang, Yi b Wu, Fei a Zhuang, Yueting a

a ZHEJIANG UNIVERSITY (China)

b UNIVERSITY OF TECHNOLOGY SYDNEY (Australia)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; NEURAL NETWORKS; VIDEO RECORDING;

CONVOLUTIONAL NEURAL NETWORK; DIFFERENT GRANULARITIES; INFORMATION FLOWS; STATE OF THE ART; TEMPORAL INFORMATION; TEMPORAL STRUCTURES; VIDEO REPRESENTATIONS; VIDEO-CONTENT ANALYSIS;

PATTERN RECOGNITION;

EID: 84986290372 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.117 Document Type: Conference Paper

Times cited : (441)

References (45)

1
- 85083953689
- Neural machine translation by jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
- (2015) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 84897544737
- Theano: New features and speed improvements
- F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. NIPS Workshop, 2012.
- (2012) NIPS Workshop
- Bastien, F.¹ Lamblin, P.² Pascanu, R.³ Bergstra, J.⁴ Goodfellow, I.J.⁵ Bergeron, A.⁶ Bouchard, N.⁷ Bengio, Y.⁸

3
- 0028392483
- Learning long-term dependencies with gradient descent is difficult
- Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. Neural Networks, IEEE Transactions on, 5 (2): 157-166, 1994.
- (1994) Neural Networks, IEEE Transactions on , vol.5 , Issue.2 , pp. 157-166
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

4
- 84862288320
- Theano: A CPU and GPU math expression compiler
- June
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A CPU and GPU math expression compiler. In SciPy), June 2010.
- (2010) SciPy
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

5
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
- (2011) ACL
- Chen, D.L.¹ Dolan, W.B.²

6
- 84952349295
- arXiv preprint arXiv: 1504. 00325
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. ArXiv preprint arXiv: 1504. 00325, 2015.
- (2015) Microsoft COCO Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

7
- 84997355957
- Learning phrase representations using RNN encoder-decoder for statistical machine translation
- K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, 2015.
- (2015) EMNLP
- Cho, K.¹ Van Merriënboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

8
- 85107661995
- Meteor universal: Language specific translation evaluation for any target language
- M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL, 2014.
- (2014) EACL
- Denkowski, M.¹ Lavie, A.²

9
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

10
- 84897543523
- Maxout networks
- I. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In ICML, 2013.
- (2013) ICML
- Goodfellow, I.¹ Warde-Farley, D.² Mirza, M.³ Courville, A.⁴ Bengio, Y.⁵

11
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9 (8): 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

12
- 84969584486
- Batch normalization: Accelerating deep network training by reducing internal covariate shift
- S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML, 2015.
- (2015) ICML
- Ioffe, S.¹ Szegedy, C.²

13
- 77956004473
- Aggregating local descriptors into a compact image representation
- H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, 2010.
- (2010) CVPR
- Jégou, H.¹ Douze, M.² Schmid, C.³ Pérez, P.⁴

14
- 84870183903
- 3d convolutional neural networks for human action recognition
- S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. TPAMI, 35 (1): 221-231, 2013.
- (2013) TPAMI , vol.35 , Issue.1 , pp. 221-231
- Ji, S.¹ Xu, W.² Yang, M.³ Yu, K.⁴

15
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

16
- 85083951076
- ADAM: A method for stochastic optimization
- D. Kingma and J. Ba. ADAM: A method for stochastic optimization. In ICLR, 2015.
- (2015) ICLR
- Kingma, D.¹ Ba, J.²

17
- 84876231242
- ImageNet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

18
- 26944501715
- ROUGE: A package for automatic evaluation of summaries
- C.-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In ACL workshop, 2004.
- (2004) ACL Workshop
- Lin, C.-Y.¹

19
- 85117622017
- The stanford coreNLP natural language processing toolkit
- C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL, 2014.
- (2014) ACL
- Manning, C.D.¹ Surdeanu, M.² Bauer, J.³ Finkel, J.⁴ Bethard, S.J.⁵ McClosky, D.⁶

20
- 84959228762
- Beyond short snippets: Deep networks for video classification
- J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. In CVPR, 2015.
- (2015) CVPR
- Ng, J.Y.-H.¹ Hausknecht, M.² Vijayanarasimhan, S.³ Vinyals, O.⁴ Monga, R.⁵ Toderici, G.⁶

21
- 84986332702
- Jointly modeling embedding and translation to bridge video and language
- Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui. Jointly modeling embedding and translation to bridge video and language. CVPR, 2016.
- (2016) CVPR
- Pan, Y.¹ Mei, T.² Yao, T.³ Li, H.⁴ Rui, Y.⁵

22
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: A method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

23
- 84907009416
- arXiv preprint arXiv: 1312. 6026
- R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio. How to construct deep recurrent neural networks. ArXiv preprint arXiv: 1312. 6026, 2013.
- (2013) How to Construct Deep Recurrent Neural Networks
- Pascanu, R.¹ Gulcehre, C.² Cho, K.³ Bengio, Y.⁴

24
- 84959211977
- A dataset for movie description
- A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele. A dataset for movie description. In CVPR, 2015.
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

25
- 84883487458
- Image classification with the fisher vector: Theory and practice
- J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. IJCV, 105 (3): 222-245, 2013.
- (2013) IJCV , vol.105 , Issue.3 , pp. 222-245
- Sánchez, J.¹ Perronnin, F.² Mensink, T.³ Verbeek, J.⁴

26
- 84937862424
- Two-stream convolutional networks for action recognition in videos
- K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
- (2014) NIPS
- Simonyan, K.¹ Zisserman, A.²

27
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

28
- 51449089344
- Video google: A text retrieval approach to object matching in videos
- J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In CVPR, 2003.
- (2003) CVPR
- Sivic, J.¹ Zisserman, A.²

29
- 84958256008
- A hierarchical recurrent encoder-decoder for generative context-aware query suggestion
- A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. G. Simonsen, and J.-Y. Nie. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In CIKM, 2015.
- (2015) CIKM
- Sordoni, A.¹ Bengio, Y.² Vahabi, H.³ Lioma, C.⁴ Simonsen, J.G.⁵ Nie, J.-Y.⁶

30
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15 (1): 1929-1958, 2014.
- (2014) JMLR , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

31
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
- (2014) NIPS
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

32
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

33
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, 2014.
- (2014) COLING
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.⁵

34
- 84959246420
- arXiv preprint arXiv: 1503. 01070
- A. Torabi, C. Pal, H. Larochelle, and A. Courville. Using descriptive video services to create a large data source for video annotation research. ArXiv preprint arXiv: 1503. 01070, 2015.
- (2015) Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
- Torabi, A.¹ Pal, C.² Larochelle, H.³ Courville, A.⁴

35
- 84986298009
- C3D: Generic features for video analysis
- D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: generic features for video analysis. In ICCV, 2015.
- (2015) ICCV
- Tran, D.¹ Bourdev, L.D.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

36
- 84956980995
- CIDEr: Consensus-based image description evaluation
- R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. In CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

37
- 84973882730
- Sequence to sequence-video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. ICCV, 2015.
- (2015) ICCV
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.J.⁴ Darrell, T.⁵ Saenko, K.⁶

38
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACLHLT, 2015.
- (2015) NAACLHLT
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

39
- 80052877143
- Action recognition by dense trajectories
- H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Action recognition by dense trajectories. In CVPR, 2011.
- (2011) CVPR
- Wang, H.¹ Kläser, A.² Schmid, C.³ Liu, C.-L.⁴

40
- 84898805910
- Action recognition with improved trajectories
- H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
- (2013) ICCV
- Wang, H.¹ Schmid, C.²

41
- 84959226659
- A discriminative CNN video representation for event detection
- Z. Xu, Y. Yang, and A. G. Hauptmann. A discriminative CNN video representation for event detection. In CVPR, 2015.
- (2015) CVPR
- Xu, Z.¹ Yang, Y.² Hauptmann, A.G.³

42
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015.
- (2015) ICCV
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

43
- 84986275061
- Video paragraph captioning using hierarchical recurrent neural networks
- H. Yu, J. Wang, Z. Huang, Y. Yang, and W. Xu. Video paragraph captioning using hierarchical recurrent neural networks. CVPR, 2016.
- (2016) CVPR
- Yu, H.¹ Wang, J.² Huang, Z.³ Yang, Y.⁴ Xu, W.⁵

44
- 84977575171
- Learning to execute
- W. Zaremba and I. Sutskever. Learning to execute. In ICLR, 2015.
- (2015) ICLR
- Zaremba, W.¹ Sutskever, I.²

45
- 84944053926
- arXiv preprint arXiv: 1409. 2329
- W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. ArXiv preprint arXiv: 1409. 2329, 2014.
- (2014) Recurrent Neural Network Regularization
- Zaremba, W.¹ Sutskever, I.² Vinyals, O.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.