메뉴 건너뛰기




Volumn 39, Issue 4, 2017, Pages 664-676

Deep Visual-Semantic Alignments for Generating Image Descriptions

Author keywords

deep neural networks; Image captioning; language model; recurrent neural network; visual semantic embeddings

Indexed keywords

ALIGNMENT; COMPUTATIONAL LINGUISTICS; DEEP NEURAL NETWORKS; NETWORK ARCHITECTURE; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS; SEMANTICS;

EID: 85015724750     PISSN: 01628828     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPAMI.2016.2598339     Document Type: Article
Times cited : (795)

References (68)
  • 1
    • 33846980853 scopus 로고    scopus 로고
    • What do we perceive in a glance of a real-world scene?
    • L. Fei-Fei, A. Iyer, C. Koch, and P. Perona, "What do we perceive in a glance of a real-world scene?" J. Vis., vol. 7, no. 1, 2007, Art. no. 10.
    • (2007) J. Vis. , vol.7 , Issue.1
    • Fei-Fei, L.1    Iyer, A.2    Koch, C.3    Perona, P.4
  • 3
    • 84947041871 scopus 로고    scopus 로고
    • Imagenet large scale visual recognition challenge
    • O. Russakovsky, et al., "Imagenet large scale visual recognition challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015.
    • (2015) Int. J. Comput. Vis. , vol.115 , Issue.3 , pp. 211-252
    • Russakovsky, O.1
  • 4
    • 80052901011 scopus 로고    scopus 로고
    • Baby talk: Understanding and generating simple image descriptions
    • G. Kulkarni, et al., "Baby talk: Understanding and generating simple image descriptions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1601-1608.
    • (2011) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 1601-1608
    • Kulkarni, G.1
  • 5
    • 78149311145 scopus 로고    scopus 로고
    • Every picture tells a story: Generating sentences fromimages
    • A. Farhadi, et al., "Every picture tells a story: Generating sentences fromimages," in Proc. 11th Eur. Conf. Comput. Vis., 2010, pp. 15-29.
    • (2010) Proc. 11th Eur. Conf. Comput. Vis. , pp. 15-29
    • Farhadi, A.1
  • 6
    • 84883394520 scopus 로고    scopus 로고
    • Framing image description as a ranking task: Data, models and evaluation metrics
    • M. Hodosh, P. Young, and J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics," J. Artificial Intell. Res., vol. 47, pp. 853-899, 2013.
    • (2013) J. Artificial Intell. Res. , vol.47 , pp. 853-899
    • Hodosh, M.1    Young, P.2    Hockenmaier, J.3
  • 8
    • 84906494296 scopus 로고    scopus 로고
    • From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
    • P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 67-78, 2014.
    • (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 67-78
    • Young, P.1    Lai, A.2    Hodosh, M.3    Hockenmaier, J.4
  • 10
    • 77955998009 scopus 로고    scopus 로고
    • Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
    • R. Socher and L. Fei-Fei, "Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2010, pp. 966-973.
    • (2010) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 966-973
    • Socher, R.1    Fei-Fei, L.2
  • 13
    • 50649103674 scopus 로고    scopus 로고
    • What, where and who? Classifying events by scene and object recognition
    • L.-J. Li and L. Fei-Fei, "What, where and who? classifying events by scene and object recognition," in Proc. Int. Conf. Comput. Vis., 2007, pp. 1-8.
    • (2007) Proc. Int. Conf. Comput. Vis. , pp. 1-8
    • Li, L.-J.1    Fei-Fei, L.2
  • 14
    • 70450219021 scopus 로고    scopus 로고
    • Towards total scene understanding: Classification, annotation and segmentation in an automatic framework
    • L.-J. Li, R. Socher, and L. Fei-Fei, "Towards total scene understanding: Classification, annotation and segmentation in an automatic framework," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 2036-2043.
    • (2009) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 2036-2043
    • Li, L.-J.1    Socher, R.2    Fei-Fei, L.3
  • 17
    • 84906925854 scopus 로고    scopus 로고
    • Grounded compositional semantics for finding and describing images with sentences
    • R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 207-218, 2014.
    • (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 207-218
    • Socher, R.1    Karpathy, A.2    Le, Q.V.3    Manning, C.D.4    Ng, A.Y.5
  • 23
    • 84973931408 scopus 로고    scopus 로고
    • From image annotation to image description
    • Berlin, Germany: Springer
    • A. Gupta and P. Mannem, "From image annotation to image description," in Neural Information Processing. Berlin, Germany: Springer, 2012.
    • (2012) Neural Information Processing
    • Gupta, A.1    Mannem, P.2
  • 25
    • 77954862144 scopus 로고    scopus 로고
    • I2T: Image parsing to text description
    • Aug.
    • B. Z. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu, "I2T: Image parsing to text description," in Proc. IEEE, vol. 98, no. 8, pp. 1485-1508, Aug. 2010.
    • (2010) Proc. IEEE , vol.98 , Issue.8 , pp. 1485-1508
    • Yao, B.Z.1    Yang, X.2    Lin, L.3    Lee, M.W.4    Zhu, S.-C.5
  • 27
  • 30
    • 84944115859 scopus 로고    scopus 로고
    • Learning a recurrent visual representation for image caption generation
    • [Online]
    • X. Chen and C. L. Zitnick, "Learning a recurrent visual representation for image caption generation," CoRR, 2014. [Online]. Available: http://arxiv.org/abs/1411.5654
    • (2014) CoRR
    • Chen, X.1    Zitnick, C.L.2
  • 31
    • 84959236502 scopus 로고    scopus 로고
    • Long-term recurrent convolutional networks for visual recognition and description
    • J. Donahue, et al., "Long-term recurrent convolutional networks for visual recognition and description," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2625-2634.
    • (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 2625-2634
    • Donahue, J.1
  • 40
    • 84973911532 scopus 로고    scopus 로고
    • Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
    • Y. Zhu, et al., "Aligning books and movies: Towards story-like visual explanations by watching movies and reading books," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 19-27.
    • (2015) Proc. IEEE Int. Conf. Comput. Vis. , pp. 19-27
    • Zhu, Y.1
  • 41
    • 84898958665 scopus 로고    scopus 로고
    • Devise: A deep visual-semantic embedding model
    • A. Frome, et al., "Devise: A deep visual-semantic embedding model," in Proc. Advances Neural Inf. Process. Syst., 2013, pp. 2121-2129.
    • (2013) Proc. Advances Neural Inf. Process. Syst. , pp. 2121-2129
    • Frome, A.1
  • 44
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • Nov.
    • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
    • (1998) Proc. IEEE , vol.86 , Issue.11 , pp. 2278-2324
    • LeCun, Y.1    Bottou, L.2    Bengio, Y.3    Haffner, P.4
  • 53
    • 0031268931 scopus 로고    scopus 로고
    • Bidirectional recurrent neural networks
    • Nov.
    • M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673-2681, Nov. 1997.
    • (1997) IEEE Trans. Signal Process , vol.45 , Issue.11 , pp. 2673-2681
    • Schuster, M.1    Paliwal, K.K.2
  • 54
    • 84935113569 scopus 로고
    • Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
    • Apr.
    • A. J. Viterbi, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Trans. Inf. Theory, vol. TIT-13, no. 2, pp. 260-269, Apr. 1967.
    • (1967) IEEE Trans. Inf. Theory , vol.TIT-13 , Issue.2 , pp. 260-269
    • Viterbi, A.J.1
  • 55
    • 26444565569 scopus 로고
    • Finding structure in time
    • J. L. Elman, "Finding structure in time," Cogn. Science, vol. 14, no. 2, pp. 179-211, 1990.
    • (1990) Cogn. Science , vol.14 , Issue.2 , pp. 179-211
    • Elman, J.L.1
  • 57
    • 84893343292 scopus 로고    scopus 로고
    • Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude
    • T. Tieleman and G. Hinton, "Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude," COURSERA: Neural Networks for Machine Learning, vol. 4, no. 2, 2012.
    • (2012) COURSERA: Neural Networks for Machine Learning , vol.4 , Issue.2
    • Tieleman, T.1    Hinton, G.2
  • 60
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 64
    • 84926007060 scopus 로고    scopus 로고
    • METEOR universal: Language specific translation evaluation for any target language
    • M. Denkowski and A. Lavie, "METEOR universal: Language specific translation evaluation for any target language," in Proc. 9th Workshop Statistical Mach. Transl., 2014, pp. 67-78
    • (2014) Proc. 9th Workshop Statistical Mach. Transl. , pp. 67-78
    • Denkowski, M.1    Lavie, A.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.