SCOPUS 정보 검색 플랫폼

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volumn 39, Issue 4, 2017, Pages 677-691

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

(7) Donahue, Jeff a Hendricks, Lisa Anne a Rohrbach, Marcus a,b Venugopalan, Subhashini c Guadarrama, Sergio a Saenko, Kate d Darrell, Trevor a,b

a UNIVERSITY OF CALIFORNIA (United States)

b INTERNATIONAL COMPUTER SCIENCE INSTITUTE (United States)

c University of Texas at Austin (United States)

d University of Massachusetts Lowell (United States)

Author keywords

Computer vision; convolutional nets; deep learning; transfer learning

Indexed keywords

ARTIFICIAL INTELLIGENCE; COMPUTER VISION; DEEP LEARNING;

COMPOSITIONAL REPRESENTATION; CONVOLUTIONAL NETS; CONVOLUTIONAL NETWORKS; LONG-TERM DEPENDENCIES; NATURAL LANGUAGE TEXT; PERCEPTUAL REPRESENTATIONS; TRANSFER LEARNING; VISUAL REPRESENTATIONS;

CONVOLUTION;

EID: 85020685307 PISSN: 01628828 EISSN: None Source Type: Journal
DOI: 10.1109/TPAMI.2016.2599174 Document Type: Article

Times cited : (1117)

References (73)

1
- 84870183903
- 3D convolutional neural networks for human action recognition
- Jan
- S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 221-231, Jan. 2013
- (2013) IEEE Trans. Pattern Anal. Mach. Intell , vol.35 , Issue.1 , pp. 221-231
- Ji, S.¹ Xu, W.² Yang, M.³ Yu, K.⁴

2
- 81855221241
- Sequential deep learning for human action recognition
- M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, "Sequential deep learning for human action recognition," in Proc. 2nd Int. Conf. Human Behavior Understanding, 2011, pp. 29-39
- (2011) Proc. 2nd Int. Conf. Human Behavior Understanding , pp. 29-39
- Baccouche, M.¹ Mamalet, F.² Wolf, C.³ Garcia, C.⁴ Baskurt, A.⁵

3
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 1725-1732
- (2014) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 1725-1732
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

4
- 84937862424
- Two-stream convolutional networks for action recognition in videos
- K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Proc. Advances Neural Inf. Process. Syst., 2014, pp. 568-576
- (2014) Proc. Advances Neural Inf. Process. Syst , pp. 568-576
- Simonyan, K.¹ Zisserman, A.²

5
- 0003465475
- Learning internal representations by error propagation
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning internal representations by error propagation," DTIC Document, Fort Belvoir, VA, USA, Tech. Rep. ICS 8506, 1985
- (1985) DTIC Document, Fort Belvoir, VA, USA, Tech. Rep. ICS , pp. 8506
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

6
- 0001202594
- A learning algorithm for continually running fully recurrent neural networks
- Cambridge, MA, USA: MIT Press
- R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," in Neural Computation. Cambridge, MA, USA: MIT Press, 1989
- (1989) Neural Computation
- Williams, R.J.¹ Zipser, D.²

7
- 0031573117
- Long short-term memory
- Cambridge, MA, USA: MIT Press
- S. Hochreiter and J. Schmidhuber, "Long short-term memory, " in, Neural Computation. Cambridge, MA, USA: MIT Press, 1997
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

8
- 84919832465
- Towards end-to-end speech recognition with recurrent neural networks
- A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 1764-1772
- (2014) Proc. 31st Int. Conf. Mach. Learn , pp. 1764-1772
- Graves, A.¹ Jaitly, N.²

9
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Proc. Advances Neural Inf. Process. Syst., 2014, pp. 3104-3112
- (2014) Proc. Advances Neural Inf. Process. Syst , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

10
- 85097641926
- On the properties of neural machine translation: Encoder-decoder approaches
- K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, "On the properties of neural machine translation: Encoder-decoder approaches," in Proc. 8th Workshop Syntax Semantics Struct. Statistical Transl., 2014, pp. 103-111
- (2014) Proc. 8th Workshop Syntax Semantics Struct. Statistical Transl , pp. 103-111
- Cho, K.¹ Van Merrienboer, B.² Bahdanau, D.³ Bengio, Y.⁴

11
- 84898775239
- Translating video content to natural language descriptions
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele, "Translating video content to natural language descriptions," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 433-440
- (2013) Proc. IEEE Int. Conf. Comput. Vis , pp. 433-440
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

12
- 84913580146
- Caffe: Convolutional architecture for fast feature embedding
- Y. Jia "Caffe: Convolutional architecture for fast feature embedding," in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675-678
- (2014) Proc. 22nd ACM Int. Conf. Multimedia , pp. 675-678
- Jia, Y.¹

13
- 84959894695
- Learning to execute
- abs/1410.4615
- W. Zaremba and I. Sutskever, "Learning to execute," CoRR, vol. abs/1410.4615, 2014, http://arxiv.org/abs/1410.4615
- (2014) CoRR
- Zaremba, W.¹ Sutskever, I.²

14
- 84953873103
- Generating sequences with recurrent neural networks
- abs/1308.0850
- A. Graves, "Generating sequences with recurrent neural networks," CoRR, vol. abs/1308.0850, 2013, http://arxiv.org/abs/1308.0850
- (2013) CoRR
- Graves, A.¹

15
- 84867626068
- Revisiting recurrent neural networks for robust ASR
- O. Vinyals, S. V. Ravuri, and D. Povey, "Revisiting recurrent neural networks for robust ASR," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2012, pp. 4085-4088
- (2012) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , pp. 4085-4088
- Vinyals, O.¹ Ravuri, S.V.² Povey, D.³

16
- 80053459857
- Generating text with recurrent neural networks
- I. Sutskever, J. Martens, and G. E. Hinton, "Generating text with recurrent neural networks," in Proc. 28th Int. Conf. Mach. Learn., 2011, pp. 1017-1024
- (2011) Proc. 28th Int. Conf. Mach. Learn , pp. 1017-1024
- Sutskever, I.¹ Martens, J.² Hinton, G.E.³

17
- 84876231242
- ImageNet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. Advances Neural Inf. Process. Syst., 2012, pp. 1106-1114
- (2012) Proc. Advances Neural Inf. Process. Syst , pp. 1106-1114
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

18
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proc. Int. Conf. Learn. Representations, 2015
- (2015) Proc. Int. Conf. Learn. Representations
- Simonyan, K.¹ Zisserman, A.²

19
- 84937522268
- Going deeper with convolutions
- C. Szegedy, et al., "Going deeper with convolutions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 1-9
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 1-9
- Szegedy, C.¹

20
- 84961291190
- Learning phrase representations using RNN encoder-decoder for statistical machine translation
- K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," in Proc. Conf. Empirical Methods Natural Language Process., 2014, pp. 1724-1734
- (2014) Proc. Conf. Empirical Methods Natural Language Process , pp. 1724-1734
- Cho, K.¹ Van Merrienboer, B.² Gulcehre, C.³ Bougares, F.⁴ Schwenk, H.⁵ Bengio, Y.⁶

21
- 35048833329
- High accuracy optical flow estimation based on a theory for warping
- T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, "High accuracy optical flow estimation based on a theory for warping," in Proc. Eur. Conf. Comput. Vis., 2004, pp. 25-36
- (2004) Proc. Eur. Conf. Comput. Vis , pp. 25-36
- Brox, T.¹ Bruhn, A.² Papenberg, N.³ Weickert, J.⁴

22
- 84906489074
- Visualizing and understanding convolutional networks
- M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818-833
- (2014) Proc. Eur. Conf. Comput. Vis , pp. 818-833
- Zeiler, M.D.¹ Fergus, R.²

23
- 84947041871
- Imagenet large scale visual recognition challenge
- O. Russakovsky, et al., "ImageNet Large Scale Visual Recognition Challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015
- (2015) Int. J. Comput. Vis. , vol.115 , Issue.3 , pp. 211-252
- Russakovsky, O.¹

24
- 85198028989
- ImageNet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 248-255
- (2009) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

25
- 84884955228
- UCF101: A dataset of 101 human actions classes from videos in the wild
- K. Soomro, A. R. Zamir, and M. Shah, "UCF101: A dataset of 101 human actions classes from videos in the wild," Univ. Central Florida, Orlando, FL, USA, Tech. Rep. CRCV-TR-12-01, 2012
- (2012) Univ. Central Florida, Orlando, FL, USA, Tech. Rep. CRCV-TR-12-01
- Soomro, K.¹ Zamir, A.R.² Shah, M.³

26
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- P. Y. Micah Hodosh and, J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics," in J. Artificial Intell. Res., vol. 47, no. 1, pp. 853-899, 2013
- (2013) J. Artificial Intell. Res. , vol.47 , Issue.1 , pp. 853-899
- Micah Hodosh, P.Y.¹ Hockenmaier, J.²

27
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-RNN)
- San Diego, CA, USA
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille, "Deep captioning with multimodal recurrent neural networks (m-RNN)," presented at the Int. Conf. Learn. Representations, San Diego, CA, USA, 2015
- (2015) The Int. Conf. Learn. Representations
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.⁵

28
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and L. Fei-Fei, "Deep fragment embeddings for bidirectional image sentence mapping," in Proc. Advances Neural Inf. Process. Syst., 2014, pp. 1889-1897
- (2014) Proc. Advances Neural Inf. Process. Syst , pp. 1889-1897
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

29
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 207-218, 2014
- (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 207-218
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

30
- 84898958665
- Devise: A deep visual-semantic embedding model
- A. Frome, et al., "Devise: A deep visual-semantic embedding model," in Advances Neural Inf. Process. Syst., 2013, pp. 2121-2129
- (2013) Advances Neural Inf. Process. Syst , pp. 2121-2129
- Frome, A.¹

31
- 84946802533
- Unifying visualsemantic embeddings with multimodal neural language models
- abs/1411.2539
- R. Kiros, R. Salakhuditnov, and R. S. Zemel, "Unifying visualsemantic embeddings with multimodal neural language models," CoRR, vol. abs/1411.2539, 2014, http://arxiv.org/abs/1411.2539
- (2014) CoRR
- Kiros, R.¹ Salakhuditnov, R.² Zemel, R.S.³

32
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- M. H. Peter Young, A. Lai, and J. Hockenmaier, "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 67-68, 2014
- (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 67-68
- Peter Young, M.H.¹ Lai, A.² Hockenmaier, J.³

33
- 84906493406
- Microsoft COCO: Common objects in context
- Springer
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft COCO: Common objects in context," Eur. Conf. Comput. Vis., Springer, pp. 740-755, 2014
- (2014) Eur. Conf. Comput. Vis , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollar, P.⁷ Zitnick, C.L.⁸

34
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "BLEU: A method for automatic evaluation of machine translation," in Proc. 40th Annu. Meet. Assoc. Comput. Linguistics, 2002, pp. 311-318
- (2002) Proc. 40th Annu. Meet. Assoc. Comput. Linguistics , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

35
- 84956980995
- CIDEr: Consensusbased image description evaluation
- R. Vedantam, C. L. Zitnick, and D. Parikh, "CIDEr: Consensusbased image description evaluation," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 4566-4575
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 4566-4575
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

36
- 85116156579
- METEOR: An automatic metric for MT evaluation with improved correlation with human judgments
- S. Banerjee and A. Lavie, "METEOR: An automatic metric for MT evaluation with improved correlation with human judgments," in Proc. ACL Workshop Intrinsic Extrinsic Evaluation Measures Mach. Transl. Summarization, 2005, pp. 65-72
- (2005) Proc. ACL Workshop Intrinsic Extrinsic Evaluation Measures Mach. Transl. Summarization , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

37
- 26944501715
- Rouge: A package for automatic evaluation of summaries
- C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Proc. ACL Workshop Text Summarization Branches Out, 2004, pp. 25-26
- (2004) Proc. ACL Workshop Text Summarization Branches Out , pp. 25-26
- Lin, C.-Y.¹

38
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3156-3164
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

39
- 84944096380
- Language models for image captioning: The quirks and what works
- J. Devlin, et al., "Language models for image captioning: The quirks and what works," in Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics, 2015, pp. 100-105
- (2015) Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics , pp. 100-105
- Devlin, J.¹

40
- 84973863256
- Learning like a child: Fast novel visual concept learning from sentence descriptions of images
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, "Learning like a child: Fast novel visual concept learning from sentence descriptions of images," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2533-2541
- (2015) Proc. IEEE Int. Conf. Comput. Vis , pp. 2533-2541
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

41
- 84959250180
- From captions to visual concepts and back
- H. Fang, et al., "From captions to visual concepts and back," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 1473-1482
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 1473-1482
- Fang, H.¹

42
- 84986257720
- Exploring nearest neighbor approaches for image captioning
- abs/1505.04467
- J. Devlin, S. Gupta, R. B. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring nearest neighbor approaches for image captioning," CoRR, vol. abs/1505.04467, 2015, http://arxiv.org/abs/1505.04467
- (2015) CoRR
- Devlin, J.¹ Gupta, S.² Girshick, R.B.³ Mitchell, M.⁴ Zitnick, C.L.⁵

43
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, et al., "Long-term recurrent convolutional networks for visual recognition and description," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2625-2634
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 2625-2634
- Donahue, J.¹

44
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- Lille, France
- K. Xu, et al., "Show, attend and tell: Neural image caption generation with visual attention," presented at the 32nd Int. Conf. Mach. Learn., Lille, France, 2015
- (2015) The 32nd Int. Conf. Mach. Learn
- Xu, K.¹

45
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei, "Deep visual-semantic alignments for generating image descriptions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3128-3137
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

46
- 84934873221
- TreeTalk: Composition and compression of trees for image descriptions
- P. Kuznetsova, V. Ordonez, T. L. Berg, U. C. Hill, and Y. Choi, "TreeTalk: Composition and compression of trees for image descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, no. 10, pp. 351-362, 2014
- (2014) Trans. Assoc. Comput. Linguistics , vol.2 , Issue.10 , pp. 351-362
- Kuznetsova, P.¹ Ordonez, V.² Berg, T.L.³ Hill, U.C.⁴ Choi, Y.⁵

47
- 85110867932
- Moses: Open source toolkit for statistical machine translation
- P. Koehn, et al., "Moses: Open source toolkit for statistical machine translation," in Proc. 45th Annu. Meeting ACL Interactive Poster Demonstration Sessions, 2007, pp. 177-180
- (2007) Proc. 45th Annu. Meeting ACL Interactive Poster Demonstration Sessions , pp. 177-180
- Koehn, P.¹

48
- 84960170289
- Coherent multi-sentence video description with variable level of detail
- Berlin, Germany: Springer
- A. Rohrbach, M. Rohrbach, W. Qiu, A. Friedrich, M. Pinkal, and B. Schiele, "Coherent multi-sentence video description with variable level of detail," in German Conf. Pattern Recog. (GCPR). Berlin, Germany: Springer, 2014
- (2014) German Conf. Pattern Recog. (GCPR)
- Rohrbach, A.¹ Rohrbach, M.² Qiu, W.³ Friedrich, A.⁴ Pinkal, M.⁵ Schiele, B.⁶

49
- 84876945537
- Dense trajectories and motion boundary descriptors for action recognition
- H. Wang, A. Klaser, C. Schmid, and C. Liu, "Dense trajectories and motion boundary descriptors for action recognition," Int. J. Comput. Vis., vol. 103, pp. 60-79, 2013
- (2013) Int. J. Comput. Vis. , vol.103 , pp. 60-79
- Wang, H.¹ Klaser, A.² Schmid, C.³ Liu, C.⁴

50
- 84898805910
- Action recognition with improved trajectories
- H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 3551-3558
- (2013) Proc. IEEE Int. Conf. Comput. Vis , pp. 3551-3558
- Wang, H.¹ Schmid, C.²

51
- 84856682691
- HMDB: A large video database for human motion recognition
- H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, "HMDB: A large video database for human motion recognition," in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2556-2563
- (2011) Proc. IEEE Int. Conf. Comput. Vis , pp. 2556-2563
- Kuehne, H.¹ Jhuang, H.² Garrote, E.³ Poggio, T.⁴ Serre, T.⁵

52
- 78049380429
- Action classification in soccer videos with long short-term memory recurrent neural networks
- M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, "Action classification in soccer videos with long short-term memory recurrent neural networks," in Proc. 20th Int. Conf. Artificial Neural Netw., 2010, pp. 154-159
- (2010) Proc. 20th Int. Conf. Artificial Neural Netw , pp. 154-159
- Baccouche, M.¹ Mamalet, F.² Wolf, C.³ Garcia, C.⁴ Baskurt, A.⁵

53
- 78149311145
- Every picture tells a story: Generating sentences from images
- A. Farhadi, et al., "Every picture tells a story: Generating sentences from images," in Proc. 11th Eur. Conf. Comput. Vis., 2010, pp. 15-29
- (2010) Proc. 11th Eur. Conf. Comput. Vis , pp. 15-29
- Farhadi, A.¹

54
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- G. Kulkarni, et al., "Baby talk: Understanding and generating simple image descriptions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1601-1608
- (2011) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 1601-1608
- Kulkarni, G.¹

55
- 80053258778
- Corpusguided sentence generation of natural images
- Y. Yang, C. L. Teo, H. Daume III, and Y. Aloimonos, "Corpusguided sentence generation of natural images," in Proc. Conf. Empirical Methods Natural Language Process., 2011, pp. 444-454
- (2011) Proc. Conf. Empirical Methods Natural Language Process , pp. 444-454
- Yang, Y.¹ Teo, C.L.² Daume, H.³ Aloimonos, Y.⁴

56
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- M. Mitchell, et al., "Midge: Generating image descriptions from computer vision detections," in Proc. 13th Conf. Eur. Chapter. Assoc. Comput. Linguistics, 2012, pp. 747-756
- (2012) Proc. 13th Conf. Eur. Chapter. Assoc. Comput. Linguistics , pp. 747-756
- Mitchell, M.¹

57
- 84878189119
- Collective generation of natural image descriptions
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi, "Collective generation of natural image descriptions," in Proc. 50th Annu. Meeting Assoc. Comput. Linguistics: Long Papers, 2012, pp. 359-368
- (2012) Proc. 50th Annu. Meeting Assoc. Comput. Linguistics: Long Papers , pp. 359-368
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

58
- 84919921461
- Multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. Zemel, "Multimodal neural language models," in Proc. 31st Int. Conf. Mach. Learn., 2014, 595-603
- (2014) Proc. 31st Int. Conf. Mach. Learn , pp. 595-603
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

59
- 84898773262
- YouTube2Text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
- S. Guadarrama, et al., "YouTube2Text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2712-2719
- (2013) Proc. IEEE Int. Conf. Comput. Vis , pp. 2712-2719
- Guadarrama, S.¹

60
- 84863029475
- Human focused video description
- M. U. G. Khan, L. Zhang, and Y. Gotoh, "Human focused video description," in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2011, pp. 1480-1487
- (2011) Proc. IEEE Int. Conf. Comput. Vis. Workshops , pp. 1480-1487
- Khan, M.U.G.¹ Zhang, L.² Gotoh, Y.³

61
- 84885996388
- Video in sentences out
- A. Barbu, et al., "Video in sentences out," in Proc. Conf. Uncertainty Artificial Intell., 2012, pp. 102-112
- (2012) Proc. Conf. Uncertainty Artificial Intell , pp. 102-112
- Barbu, A.¹

62
- 84887345951
- Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- P. Das, C. Xu, R. Doell, and J. Corso, "Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2013, pp. 2634-2641
- (2013) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 2634-2641
- Das, P.¹ Xu, C.² Doell, R.³ Corso, J.⁴

63
- 84455192418
- Towards textually describing complex video contents with audio-visual concept classifiers
- C. C. Tan, Y.-G. Jiang, and C.-W. Ngo, "Towards textually describing complex video contents with audio-visual concept classifiers," in Proc. 19th ACM Int. Conf. Multimedia, 2011, pp. 655-658
- (2011) Proc. 19th ACM Int. Conf. Multimedia , pp. 655-658
- Tan, C.C.¹ Jiang, Y.-G.² Ngo, C.-W.³

64
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. J. Mooney, "Integrating language and vision to generate natural language descriptions of videos in the wild," in Proc. 25th Int. Conf. Comput. Linguistics, 2014, pp. 1218-1227
- (2014) Proc. 25th Int. Conf. Comput. Linguistics , pp. 1218-1227
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.J.⁵

65
- 84910072094
- Sequence discriminative distributed training of long short-term memory recurrent neural networks
- Singapore
- H. Sak, et al., "Sequence discriminative distributed training of long short-term memory recurrent neural networks," presented at the 15th Annu. Conf. Int. Speech Commun. Assoc., Singapore, 2014
- (2014) The 15th Annu. Conf. Int. Speech Commun. Assoc
- Sak, H.¹

66
- 84959228762
- Beyond short snippets: Deep networks for video classification
- J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, "Beyond short snippets: Deep networks for video classification," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 4694-4702
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 4694-4702
- Ng, J.Y.-H.¹ Hausknecht, M.² Vijayanarasimhan, S.³ Vinyals, O.⁴ Monga, R.⁵ Toderici, G.⁶

67
- 84977668095
- Every moment counts: Dense detailed labeling of actions in complex videos
- abs/1507.05738
- S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and F.-F. Li, "Every moment counts: Dense detailed labeling of actions in complex videos," CoRR, vol. abs/1507.05738, 2015, http://arxiv. org/abs/1507.05738
- (2015) CoRR
- Yeung, S.¹ Russakovsky, O.² Jin, N.³ Andriluka, M.⁴ Mori, G.⁵ Li, F.-F.⁶

68
- 84986274522
- Deep compositional captioning: Describing novel object categories without paired training data
- L. A. Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell, "Deep compositional captioning: Describing novel object categories without paired training data," Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016
- (2016) Proc. IEEE Conf. Comput. Vis. Pattern Recog
- Hendricks, L.A.¹ Venugopalan, S.² Rohrbach, M.³ Mooney, R.⁴ Saenko, K.⁵ Darrell, T.⁶

69
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan,H. Xu, J.Donahue,M. Rohrbach, R.Mooney, and K. Saenko, "Translating videos to natural language using deep recurrent neural networks," in Proc. North Amer. Chapter Assoc. Comput. Linguistics-Human Language Technol., 2015, pp. 1494-1504
- (2015) Proc. North Amer. Chapter Assoc. Comput. Linguistics-Human Language Technol , pp. 1494-1504
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

70
- 84973882730
- Sequence to sequence-video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko, "Sequence to sequence-video to text," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4534-4542
- (2015) Proc. IEEE Int. Conf. Comput. Vis , pp. 4534-4542
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

71
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, et al., "Describing videos by exploiting temporal structure," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2015, pp. 4507-4515
- (2015) Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog , pp. 4507-4515
- Yao, L.¹

72
- 85027437052
- Grounding of textual phrases in images by reconstruction
- abs/1511.03745
- A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele, "Grounding of textual phrases in images by reconstruction," CoRR, vol. abs/1511.03745, 2015, http://arxiv.org/abs/1511.03745
- (2015) CoRR
- Rohrbach, A.¹ Rohrbach, M.² Hu, R.³ Darrell, T.⁴ Schiele, B.⁵

73
- 84986305787
- Natural language object retrieval
- R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell, "Natural language object retrieval," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 4555-4564.
- (2016) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 4555-4564
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.