메뉴 건너뛰기




Volumn 39, Issue 4, 2017, Pages 652-663

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge

Author keywords

Image captioning; language model; recurrent neural network; sequence to sequence

Indexed keywords

COMPUTATIONAL LINGUISTICS; NATURAL LANGUAGE PROCESSING SYSTEMS; RECURRENT NEURAL NETWORKS;

EID: 85015770940     PISSN: 01628828     EISSN: None     Source Type: Journal    
DOI: 10.1109/TPAMI.2016.2587640     Document Type: Article
Times cited : (846)

References (49)
  • 2
    • 78149311145 scopus 로고    scopus 로고
    • Every picture tells a story: Generating sentences from images
    • A. Farhadi "Every picture tells a story: Generating sentences from images," in Proc. 11th Eur. Conf. Comput. Vis.: Part IV, 2010, pp. 15-29.
    • (2010) Proc. 11th Eur. Conf. Comput. Vis.: Part IV , pp. 15-29
    • Farhadi, A.1
  • 3
    • 80052901011 scopus 로고    scopus 로고
    • Baby talk: Understanding and generating simple image descriptions
    • G. Kulkarni, "Baby talk: Understanding and generating simple image descriptions," in Proc. IEEE Conf Comput. Vis. Pattern Recog., 2011, pp. 1601-1608.
    • (2011) Proc. IEEE Conf Comput. Vis. Pattern Recog. , pp. 1601-1608
    • Kulkarni, G.1
  • 8
    • 0030397830 scopus 로고    scopus 로고
    • Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences
    • R. Gerber and H.-H. Nagel, "Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences," in Proc. Int. Conf. Image Process, 1996, pp. 805-808.
    • (1996) Proc. Int. Conf. Image Process , pp. 805-808
    • Gerber, R.1    Nagel, H.-H.2
  • 9
    • 77954862144 scopus 로고    scopus 로고
    • I2t: Image parsing to text description
    • Aug.
    • B. Z. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu, "I2t: Image parsing to text description," in Proc. IEEE, vol. 98, no. 8, pp. 1485-1508, Aug. 2010.
    • (2010) Proc. IEEE , vol.98 , Issue.8 , pp. 1485-1508
    • Yao, B.Z.1    Yang, X.2    Lin, L.3    Lee, M.W.4    Zhu, S.-C.5
  • 14
  • 16
    • 84883394520 scopus 로고    scopus 로고
    • Framing image description as a ranking task: Data, models and evaluation metrics
    • M. Hodosh, P. Young, and J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics," J. Artif. Intell. Res., vol. 47, pp. 853-899, 2013.
    • (2013) J. Artif. Intell. Res. , vol.47 , pp. 853-899
    • Hodosh, M.1    Young, P.2    Hockenmaier, J.3
  • 24
    • 84969584486 scopus 로고    scopus 로고
    • Batch normalization: Accelerating deep network training by reducing internal covariate shift
    • S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proc. Int. Conf. Mach. Learn., 2015.
    • (2015) Proc. Int. Conf. Mach. Learn.
    • Ioffe, S.1    Szegedy, C.2
  • 25
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 30
    • 84959236502 scopus 로고    scopus 로고
    • Long-term recurrent convolutional networks for visual recognition and description
    • J. Donahue, "Long-term recurrent convolutional networks for visual recognition and description," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015.
    • (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog.
    • Donahue, J.1
  • 31
    • 84970002232 scopus 로고    scopus 로고
    • Show, attend and tell: Neural image caption generation with visual attention
    • K. Xu, "Show, attend and tell: Neural image caption generation with visual attention," in Proc. Int. Conf. Mach. Learn., 2015.
    • (2015) Proc. Int. Conf. Mach. Learn.
    • Xu, K.1
  • 33
    • 85015796277 scopus 로고    scopus 로고
    • Mind's eye: A recurrent visual representation for image caption generation
    • X. Chen and C. L. Zitnick, "Mind's eye: A recurrent visual representation for image caption generation," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
    • Chen, X.1    Zitnick, C.L.2
  • 34
    • 84944096380 scopus 로고    scopus 로고
    • Language models for image captioning: The quirks and what works
    • J. Devlin, "Language models for image captioning: The quirks and what works," in Proc. Assoc. Comput. Linguistics, 2015.
    • (2015) Proc. Assoc. Comput. Linguistics
    • Devlin, J.1
  • 35
    • 84919881041 scopus 로고    scopus 로고
    • Decaf: A deep convolutional activation feature for generic visual recognition
    • J. Donahue, et al., "Decaf: A deep convolutional activation feature for generic visual recognition," in Proc. Int. Conf. Mach. Learn., 2014.
    • (2014) Proc. Int. Conf. Mach. Learn.
    • Donahue, J.1
  • 43
    • 84906494296 scopus 로고    scopus 로고
    • From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
    • P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 67-78, 2014.
    • (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 67-78
    • Young, P.1    Lai, A.2    Hodosh, M.3    Hockenmaier, J.4
  • 47
  • 49
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • L. Breiman, "Bagging predictors," in Proc. Mach. Learn., vol. 24, 1996, pp. 123-140.
    • (1996) Proc. Mach. Learn. , vol.24 , pp. 123-140
    • Breiman, L.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.