SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2015 International Conference on Computer Vision, ICCV 2015, Issue , 2015, Pages 4507-4515

Describing videos by exploiting temporal structure

(7) Yao, Li a Torabi, Atousa a Cho, Kyunghyun a Ballas, Nicolas a Pal, Christopher b Larochelle, Hugo c Courville, Aaron a

a UNIVERSITÉ DE MONTRÉAL (Canada)

b ÉCOLE POLYTECHNIQUE DE MONTRÉAL (Canada)

c UNIVERSITÉ DE SHERBROOKE (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

BEHAVIORAL RESEARCH; COMPUTATIONAL LINGUISTICS; COMPUTER VISION; MODELING LANGUAGES; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS;

ACTION RECOGNITION; ATTENTION MECHANISMS; CONVOLUTIONAL NEURAL NETWORK; IMAGE DESCRIPTIONS; RECURRENT NEURAL NETWORK (RNNS); TEMPORAL MODELING; TEMPORAL SEGMENTS; TEMPORAL STRUCTURES;

MOTION ANALYSIS;

EID: 84973884896 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2015.512 Document Type: Conference Paper

Times cited : (1129)

References (43)

1
- 85083953689
- Neural machine translation by jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2015.
- (2015) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 84885996388
- Video in sentences out
- A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, et al. Video in sentences out. UAI, 2012.
- (2012) UAI
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰

3
- 84897544737
- Theano: New features and speed improvements
- F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012Workshop, 2012.
- (2012) Deep Learning and Unsupervised Feature Learning NIPS 2012Workshop
- Bastien, F.¹ Lamblin, P.² Pascanu, R.³ Bergstra, J.⁴ Goodfellow, I.J.⁵ Bergeron, A.⁶ Bouchard, N.⁷ Bengio, Y.⁸

4
- 84857819132
- Theano: A CPU and GPU math expression compiler
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A CPU and GPU math expression compiler. In Pro-ceedings of the Python for Scientific Computing Conference (SciPy), 2010.
- (2010) Pro-ceedings of the Python for Scientific Computing Conference (SciPy)
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

5
- 84943800045
- Weakly supervised action labeling in videos under ordering constraints
- P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Weakly supervised action labeling in videos under ordering constraints. In ECCV. 2014.
- (2014) ECCV
- Bojanowski, P.¹ Lajugie, R.² Bach, F.³ Laptev, I.⁴ Ponce, J.⁵ Schmid, C.⁶ Sivic, J.⁷

6
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
- (2011) ACL
- Chen, D.L.¹ Dolan, W.B.²

7
- 84952349295
- arXiv 1504. 00325
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. ArXiv 1504. 00325, 2015.
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

8
- 84961291190
- Learning phrase representations using RNN encoder-decoder for statistical machine translation
- Oct.
- K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, Oct. 2014.
- (2014) EMNLP
- Cho, K.¹ Van Merrienboer, B.² Gulcehre, C.³ Bougares, F.⁴ Schwenk, H.⁵ Bengio, Y.⁶

9
- 34948855444
- Human detection using oriented histograms of flow and appearance
- N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. In ECCV. 2006.
- (2006) ECCV
- Dalal, N.¹ Triggs, B.² Schmid, C.³

10
- 84926007060
- Meteor universal: Language specific translation evaluation for any target language
- M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL Workshop, 2014.
- (2014) EACL Workshop
- Denkowski, M.¹ Lavie, A.²

11
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

12
- 84973872525
- Temporal localization of actions with actoms
- A. Gaidon, Z. Harchaoui, and C. Schmid. Temporal localization of actions with actoms. PAMI, 2013.
- (2013) PAMI
- Gaidon, A.¹ Harchaoui, Z.² Schmid, C.³

13
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 1997.
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

14
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 2013.
- (2013) Journal of Artificial Intelligence Research
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

15
- 84870183903
- 3d convolutional neural networks for human action recognition
- S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. PAMI, 2013.
- (2013) PAMI
- Ji, S.¹ Xu, W.² Yang, M.³ Yu, K.⁴

16
- 84913555165
- arXiv:1408. 5093
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. ArXiv:1408. 5093, 2014.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

17
- 84952902559
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

18
- 84996541359
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR. IEEE, 2014.
- (2014) CVPR. IEEE
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

19
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. ACL, 2014.
- (2014) ACL
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

20
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. IJCV, 2002.
- (2002) IJCV
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

21
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

22
- 84877258349
- Youtube scale, large vocabulary video annotation
- N. Morsillo, G. Mann, and C. Pal. Youtube scale, large vocabulary video annotation. In Video Search and Mining. 2010.
- (2010) Video Search and Mining.
- Morsillo, N.¹ Mann, G.² Pal, C.³

23
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

24
- 84965108042
- arXiv: 1412. 6604
- M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert, and S. Chopra. Video (language) modeling: A baseline for generative models of natural videos. ArXiv: 1412. 6604, 2014.
- (2014) Video (Language) Modeling: A Baseline for Generative Models of Natural Videos
- Ranzato, M.¹ Szlam, A.² Bruna, J.³ Mathieu, M.⁴ Collobert, R.⁵ Chopra, S.⁶

25
- 84959211977
- A dataset for movie description
- A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele. A dataset for movie description. CVPR, 2015.
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

26
- 84898775239
- Translating video content to natural language descriptions
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
- (2013) ICCV
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

27
- 85083951635
- Overfeat: Integrated recognition, localization and detection using convolutional networks
- P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. ICLR, 2014.
- (2014) ICLR
- Sermanet, P.¹ Eigen, D.² Zhang, X.³ Mathieu, M.⁴ Fergus, R.⁵ LeCun, Y.⁶

28
- 84937862424
- Two-stream convolutional networks for action recognition in videos
- K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. NIPS, 2014.
- (2014) NIPS
- Simonyan, K.¹ Zisserman, A.²

29
- 84944082890
- arXiv: 1502. 04681
- N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using lstms. ArXiv: 1502. 04681, 2015.
- (2015) Unsupervised Learning of Video Representations Using Lstms
- Srivastava, N.¹ Mansimov, E.² Salakhutdinov, R.³

30
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. V. Le. Sequence to sequence learning with neural networks. In NIPS. 2014.
- (2014) NIPS
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.V.³

31
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

32
- 84887372329
- Learning latent temporal structure for complex event detection
- K. Tang, L. Fei-Fei, and D. Koller. Learning latent temporal structure for complex event detection. In CVPR. IEEE, 2012.
- (2012) CVPR. IEEE
- Tang, K.¹ Fei-Fei, L.² Koller, D.³

33
- 78149336740
- Convolutional learning of spatio-temporal features
- Springer
- G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler. Convolutional learning of spatio-temporal features. In Computer Vision-ECCV 2010, pages 140-153. Springer, 2010.
- (2010) Computer Vision-ECCV 2010 , pp. 140-153
- Taylor, G.W.¹ Fergus, R.² LeCun, Y.³ Bregler, C.⁴

34
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, 2014.
- (2014) COLING
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.⁵

35
- 84959246420
- arXiv: 1503. 01070
- A. Torabi, C. Pal, H. Larochelle, and A. Courville. Using descriptive video services to create a large data source for video annotation research. ArXiv: 1503. 01070, 2015.
- (2015) Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
- Torabi, A.¹ Pal, C.² Larochelle, H.³ Courville, A.⁴

36
- 84969504307
- arXiv:1412. 0767
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: Generic features for video analysis. ArXiv:1412. 0767, 2014.
- (2014) C3D: Generic Features for Video Analysis
- Tran, D.¹ Bourdev, L.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

37
- 84956980995
- CIDEr: Consensus-based image description evaluation
- R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

38
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. NAACL, 2015.
- (2015) NAACL
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

39
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

40
- 77958592879
- University of Central Florida, U. S. A
- H. Wang, M. M. Ullah, A. Klser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. In University of Central Florida, U. S. A, 2009.
- (2009) Evaluation of Local Spatio-temporal Features for Action Recognition
- Wang, H.¹ Ullah, M.M.² Klser, A.³ Laptev, I.⁴ Schmid, C.⁵

41
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- K. Xu, J. Ba, R. Kiros,, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

42
- 84944053926
- arXiv: 1409. 2329
- W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. ArXiv: 1409. 2329, 2014.
- (2014) Recurrent Neural Network Regularization
- Zaremba, W.¹ Sutskever, I.² Vinyals, O.³

43
- 84969736572
- Technical report
- M. D. Zeiler. ADADELTA: An adaptive learning rate method. Technical report, 2012.
- (2012) ADADELTA: An Adaptive Learning Rate Method
- Zeiler, M.D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.