메뉴 건너뛰기




Volumn 39, Issue 4, 2017, Pages 677-691

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

Author keywords

Computer vision; convolutional nets; deep learning; transfer learning

Indexed keywords

ARTIFICIAL INTELLIGENCE; COMPUTER VISION; DEEP LEARNING;

EID: 85020685307     PISSN: 01628828     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPAMI.2016.2599174     Document Type: Article
Times cited : (1117)

References (73)
  • 1
    • 84870183903 scopus 로고    scopus 로고
    • 3D convolutional neural networks for human action recognition
    • Jan
    • S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 221-231, Jan. 2013
    • (2013) IEEE Trans. Pattern Anal. Mach. Intell , vol.35 , Issue.1 , pp. 221-231
    • Ji, S.1    Xu, W.2    Yang, M.3    Yu, K.4
  • 4
    • 84937862424 scopus 로고    scopus 로고
    • Two-stream convolutional networks for action recognition in videos
    • K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Proc. Advances Neural Inf. Process. Syst., 2014, pp. 568-576
    • (2014) Proc. Advances Neural Inf. Process. Syst , pp. 568-576
    • Simonyan, K.1    Zisserman, A.2
  • 6
    • 0001202594 scopus 로고
    • A learning algorithm for continually running fully recurrent neural networks
    • Cambridge, MA, USA: MIT Press
    • R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," in Neural Computation. Cambridge, MA, USA: MIT Press, 1989
    • (1989) Neural Computation
    • Williams, R.J.1    Zipser, D.2
  • 7
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • Cambridge, MA, USA: MIT Press
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory, " in, Neural Computation. Cambridge, MA, USA: MIT Press, 1997
    • (1997) Neural Computation
    • Hochreiter, S.1    Schmidhuber, J.2
  • 8
    • 84919832465 scopus 로고    scopus 로고
    • Towards end-to-end speech recognition with recurrent neural networks
    • A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proc. 31st Int. Conf. Mach. Learn., 2014, pp. 1764-1772
    • (2014) Proc. 31st Int. Conf. Mach. Learn , pp. 1764-1772
    • Graves, A.1    Jaitly, N.2
  • 12
    • 84913580146 scopus 로고    scopus 로고
    • Caffe: Convolutional architecture for fast feature embedding
    • Y. Jia "Caffe: Convolutional architecture for fast feature embedding," in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675-678
    • (2014) Proc. 22nd ACM Int. Conf. Multimedia , pp. 675-678
    • Jia, Y.1
  • 13
    • 84959894695 scopus 로고    scopus 로고
    • Learning to execute
    • abs/1410.4615
    • W. Zaremba and I. Sutskever, "Learning to execute," CoRR, vol. abs/1410.4615, 2014, http://arxiv.org/abs/1410.4615
    • (2014) CoRR
    • Zaremba, W.1    Sutskever, I.2
  • 14
    • 84953873103 scopus 로고    scopus 로고
    • Generating sequences with recurrent neural networks
    • abs/1308.0850
    • A. Graves, "Generating sequences with recurrent neural networks," CoRR, vol. abs/1308.0850, 2013, http://arxiv.org/abs/1308.0850
    • (2013) CoRR
    • Graves, A.1
  • 22
    • 84906489074 scopus 로고    scopus 로고
    • Visualizing and understanding convolutional networks
    • M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818-833
    • (2014) Proc. Eur. Conf. Comput. Vis , pp. 818-833
    • Zeiler, M.D.1    Fergus, R.2
  • 23
    • 84947041871 scopus 로고    scopus 로고
    • Imagenet large scale visual recognition challenge
    • O. Russakovsky, et al., "ImageNet Large Scale Visual Recognition Challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015
    • (2015) Int. J. Comput. Vis. , vol.115 , Issue.3 , pp. 211-252
    • Russakovsky, O.1
  • 26
    • 84883394520 scopus 로고    scopus 로고
    • Framing image description as a ranking task: Data, models and evaluation metrics
    • P. Y. Micah Hodosh and, J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics," in J. Artificial Intell. Res., vol. 47, no. 1, pp. 853-899, 2013
    • (2013) J. Artificial Intell. Res. , vol.47 , Issue.1 , pp. 853-899
    • Micah Hodosh, P.Y.1    Hockenmaier, J.2
  • 27
  • 29
  • 30
    • 84898958665 scopus 로고    scopus 로고
    • Devise: A deep visual-semantic embedding model
    • A. Frome, et al., "Devise: A deep visual-semantic embedding model," in Advances Neural Inf. Process. Syst., 2013, pp. 2121-2129
    • (2013) Advances Neural Inf. Process. Syst , pp. 2121-2129
    • Frome, A.1
  • 31
    • 84946802533 scopus 로고    scopus 로고
    • Unifying visualsemantic embeddings with multimodal neural language models
    • abs/1411.2539
    • R. Kiros, R. Salakhuditnov, and R. S. Zemel, "Unifying visualsemantic embeddings with multimodal neural language models," CoRR, vol. abs/1411.2539, 2014, http://arxiv.org/abs/1411.2539
    • (2014) CoRR
    • Kiros, R.1    Salakhuditnov, R.2    Zemel, R.S.3
  • 32
    • 84906494296 scopus 로고    scopus 로고
    • From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
    • M. H. Peter Young, A. Lai, and J. Hockenmaier, "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 67-68, 2014
    • (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 67-68
    • Peter Young, M.H.1    Lai, A.2    Hockenmaier, J.3
  • 39
    • 84944096380 scopus 로고    scopus 로고
    • Language models for image captioning: The quirks and what works
    • J. Devlin, et al., "Language models for image captioning: The quirks and what works," in Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics, 2015, pp. 100-105
    • (2015) Proc. 53rd Annu. Meeting Assoc. Comput. Linguistics , pp. 100-105
    • Devlin, J.1
  • 40
    • 84973863256 scopus 로고    scopus 로고
    • Learning like a child: Fast novel visual concept learning from sentence descriptions of images
    • J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, "Learning like a child: Fast novel visual concept learning from sentence descriptions of images," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 2533-2541
    • (2015) Proc. IEEE Int. Conf. Comput. Vis , pp. 2533-2541
    • Mao, J.1    Xu, W.2    Yang, Y.3    Wang, J.4    Huang, Z.5    Yuille, A.6
  • 41
  • 42
    • 84986257720 scopus 로고    scopus 로고
    • Exploring nearest neighbor approaches for image captioning
    • abs/1505.04467
    • J. Devlin, S. Gupta, R. B. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring nearest neighbor approaches for image captioning," CoRR, vol. abs/1505.04467, 2015, http://arxiv.org/abs/1505.04467
    • (2015) CoRR
    • Devlin, J.1    Gupta, S.2    Girshick, R.B.3    Mitchell, M.4    Zitnick, C.L.5
  • 43
    • 84959236502 scopus 로고    scopus 로고
    • Long-term recurrent convolutional networks for visual recognition and description
    • J. Donahue, et al., "Long-term recurrent convolutional networks for visual recognition and description," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2625-2634
    • (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 2625-2634
    • Donahue, J.1
  • 44
    • 84970002232 scopus 로고    scopus 로고
    • Show, attend and tell: Neural image caption generation with visual attention
    • Lille, France
    • K. Xu, et al., "Show, attend and tell: Neural image caption generation with visual attention," presented at the 32nd Int. Conf. Mach. Learn., Lille, France, 2015
    • (2015) The 32nd Int. Conf. Mach. Learn
    • Xu, K.1
  • 45
    • 84946734827 scopus 로고    scopus 로고
    • Deep visual-semantic alignments for generating image descriptions
    • A. Karpathy and L. Fei-Fei, "Deep visual-semantic alignments for generating image descriptions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3128-3137
    • (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 3128-3137
    • Karpathy, A.1    Fei-Fei, L.2
  • 49
    • 84876945537 scopus 로고    scopus 로고
    • Dense trajectories and motion boundary descriptors for action recognition
    • H. Wang, A. Klaser, C. Schmid, and C. Liu, "Dense trajectories and motion boundary descriptors for action recognition," Int. J. Comput. Vis., vol. 103, pp. 60-79, 2013
    • (2013) Int. J. Comput. Vis. , vol.103 , pp. 60-79
    • Wang, H.1    Klaser, A.2    Schmid, C.3    Liu, C.4
  • 50
    • 84898805910 scopus 로고    scopus 로고
    • Action recognition with improved trajectories
    • H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 3551-3558
    • (2013) Proc. IEEE Int. Conf. Comput. Vis , pp. 3551-3558
    • Wang, H.1    Schmid, C.2
  • 53
    • 78149311145 scopus 로고    scopus 로고
    • Every picture tells a story: Generating sentences from images
    • A. Farhadi, et al., "Every picture tells a story: Generating sentences from images," in Proc. 11th Eur. Conf. Comput. Vis., 2010, pp. 15-29
    • (2010) Proc. 11th Eur. Conf. Comput. Vis , pp. 15-29
    • Farhadi, A.1
  • 54
    • 80052901011 scopus 로고    scopus 로고
    • Baby talk: Understanding and generating simple image descriptions
    • G. Kulkarni, et al., "Baby talk: Understanding and generating simple image descriptions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1601-1608
    • (2011) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 1601-1608
    • Kulkarni, G.1
  • 59
    • 84898773262 scopus 로고    scopus 로고
    • YouTube2Text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
    • S. Guadarrama, et al., "YouTube2Text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2712-2719
    • (2013) Proc. IEEE Int. Conf. Comput. Vis , pp. 2712-2719
    • Guadarrama, S.1
  • 62
    • 84887345951 scopus 로고    scopus 로고
    • Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
    • P. Das, C. Xu, R. Doell, and J. Corso, "Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2013, pp. 2634-2641
    • (2013) Proc. IEEE Conf. Comput. Vis. Pattern Recog , pp. 2634-2641
    • Das, P.1    Xu, C.2    Doell, R.3    Corso, J.4
  • 63
    • 84455192418 scopus 로고    scopus 로고
    • Towards textually describing complex video contents with audio-visual concept classifiers
    • C. C. Tan, Y.-G. Jiang, and C.-W. Ngo, "Towards textually describing complex video contents with audio-visual concept classifiers," in Proc. 19th ACM Int. Conf. Multimedia, 2011, pp. 655-658
    • (2011) Proc. 19th ACM Int. Conf. Multimedia , pp. 655-658
    • Tan, C.C.1    Jiang, Y.-G.2    Ngo, C.-W.3
  • 65
    • 84910072094 scopus 로고    scopus 로고
    • Sequence discriminative distributed training of long short-term memory recurrent neural networks
    • Singapore
    • H. Sak, et al., "Sequence discriminative distributed training of long short-term memory recurrent neural networks," presented at the 15th Annu. Conf. Int. Speech Commun. Assoc., Singapore, 2014
    • (2014) The 15th Annu. Conf. Int. Speech Commun. Assoc
    • Sak, H.1
  • 67
    • 84977668095 scopus 로고    scopus 로고
    • Every moment counts: Dense detailed labeling of actions in complex videos
    • abs/1507.05738
    • S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and F.-F. Li, "Every moment counts: Dense detailed labeling of actions in complex videos," CoRR, vol. abs/1507.05738, 2015, http://arxiv. org/abs/1507.05738
    • (2015) CoRR
    • Yeung, S.1    Russakovsky, O.2    Jin, N.3    Andriluka, M.4    Mori, G.5    Li, F.-F.6
  • 71
    • 84973884896 scopus 로고    scopus 로고
    • Describing videos by exploiting temporal structure
    • L. Yao, et al., "Describing videos by exploiting temporal structure," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog., 2015, pp. 4507-4515
    • (2015) Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog , pp. 4507-4515
    • Yao, L.1
  • 72
    • 85027437052 scopus 로고    scopus 로고
    • Grounding of textual phrases in images by reconstruction
    • abs/1511.03745
    • A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele, "Grounding of textual phrases in images by reconstruction," CoRR, vol. abs/1511.03745, 2015, http://arxiv.org/abs/1511.03745
    • (2015) CoRR
    • Rohrbach, A.1    Rohrbach, M.2    Hu, R.3    Darrell, T.4    Schiele, B.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.