SCOPUS 정보 검색 플랫폼

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volumn 39, Issue 4, 2017, Pages 664-676

Deep Visual-Semantic Alignments for Generating Image Descriptions

(2) Karpathy, Andrej a Fei Fei, Li a

a Stanford University (United States)

Author keywords

deep neural networks; Image captioning; language model; recurrent neural network; visual semantic embeddings

Indexed keywords

ALIGNMENT; COMPUTATIONAL LINGUISTICS; DEEP NEURAL NETWORKS; NETWORK ARCHITECTURE; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS; SEMANTICS;

BIDIRECTIONAL RECURRENT NEURAL NETWORKS; CONVOLUTIONAL NEURAL NETWORK; EMBEDDINGS; IMAGE CAPTIONING; IMAGE DESCRIPTIONS; LANGUAGE MODEL; LARGE-SCALE ANALYSIS; NATURAL LANGUAGES;

VISUAL LANGUAGES;

EID: 85015724750 PISSN: 01628828 EISSN: None Source Type: Journal
DOI: 10.1109/TPAMI.2016.2598339 Document Type: Article

Times cited : (795)

References (68)

1
- 33846980853
- What do we perceive in a glance of a real-world scene?
- L. Fei-Fei, A. Iyer, C. Koch, and P. Perona, "What do we perceive in a glance of a real-world scene?" J. Vis., vol. 7, no. 1, 2007, Art. no. 10.
- (2007) J. Vis. , vol.7 , Issue.1
- Fei-Fei, L.¹ Iyer, A.² Koch, C.³ Perona, P.⁴

2
- 77951298115
- The Pascal visual object classes (VOC) challenge
- Jun.
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The Pascal visual object classes (VOC) challenge," Int. J. Comput. Vis., vol. 88, no. 2, pp. 303-338, Jun. 2010.
- (2010) Int. J. Comput. Vis. , vol.88 , Issue.2 , pp. 303-338
- Everingham, M.¹ Van Gool, L.² Williams, C.K.I.³ Winn, J.⁴ Zisserman, A.⁵

3
- 84947041871
- Imagenet large scale visual recognition challenge
- O. Russakovsky, et al., "Imagenet large scale visual recognition challenge," Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, 2015.
- (2015) Int. J. Comput. Vis. , vol.115 , Issue.3 , pp. 211-252
- Russakovsky, O.¹

4
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- G. Kulkarni, et al., "Baby talk: Understanding and generating simple image descriptions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2011, pp. 1601-1608.
- (2011) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 1601-1608
- Kulkarni, G.¹

5
- 78149311145
- Every picture tells a story: Generating sentences fromimages
- A. Farhadi, et al., "Every picture tells a story: Generating sentences fromimages," in Proc. 11th Eur. Conf. Comput. Vis., 2010, pp. 15-29.
- (2010) Proc. 11th Eur. Conf. Comput. Vis. , pp. 15-29
- Farhadi, A.¹

6
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics," J. Artificial Intell. Res., vol. 47, pp. 853-899, 2013.
- (2013) J. Artificial Intell. Res. , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

7
- 84944115859
- arXiv preprint arXiv:1411.5654
- X. Chen and C. L. Zitnick, "Learning a recurrent visual representation for image caption generation," arXiv preprint arXiv:1411.5654, 2014.
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

8
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 67-78, 2014.
- (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

9
- 0041876117
- Matching words and pictures
- K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan, "Matching words and pictures," J. Mach. Learn. Res., vol. 3, pp. 1107-1135, 2003.
- (2003) J. Mach. Learn. Res. , vol.3 , pp. 1107-1135
- Barnard, K.¹ Duygulu, P.² Forsyth, D.³ De Freitas, N.⁴ Blei, D.M.⁵ Jordan, M.I.⁶

10
- 77955998009
- Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
- R. Socher and L. Fei-Fei, "Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2010, pp. 966-973.
- (2010) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 966-973
- Socher, R.¹ Fei-Fei, L.²

11
- 84887365305
- A sentence is worth a thousand pixels
- S. Fidler, A. Sharma, and R. Urtasun, "A sentence is worth a thousand pixels," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2013, 1995-2002.
- (2013) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 1995-2002
- Fidler, S.¹ Sharma, A.² Urtasun, R.³

12
- 77953205895
- Decomposing a scene into geometric and semantically consistent regions
- S. Gould, R. Fulton, and D. Koller, "Decomposing a scene into geometric and semantically consistent regions," in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 1-8.
- (2009) Proc. IEEE 12th Int. Conf. Comput. Vis. , pp. 1-8
- Gould, S.¹ Fulton, R.² Koller, D.³

13
- 50649103674
- What, where and who? Classifying events by scene and object recognition
- L.-J. Li and L. Fei-Fei, "What, where and who? classifying events by scene and object recognition," in Proc. Int. Conf. Comput. Vis., 2007, pp. 1-8.
- (2007) Proc. Int. Conf. Comput. Vis. , pp. 1-8
- Li, L.-J.¹ Fei-Fei, L.²

14
- 70450219021
- Towards total scene understanding: Classification, annotation and segmentation in an automatic framework
- L.-J. Li, R. Socher, and L. Fei-Fei, "Towards total scene understanding: Classification, annotation and segmentation in an automatic framework," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 2036-2043.
- (2009) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 2036-2043
- Li, L.-J.¹ Socher, R.² Fei-Fei, L.³

15
- 84856653718
- Learning cross-modality similarity for multinomial data
- Y. Jia, M. Salzmann, and T. Darrell, "Learning cross-modality similarity for multinomial data," in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 2407-2414.
- (2011) Proc. IEEE Int. Conf. Comput. Vis. , pp. 2407-2414
- Jia, Y.¹ Salzmann, M.² Darrell, T.³

16
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg, "Im2text: Describing images using 1 million captioned photographs," in Proc. Advances Neural Inf. Process. Syst., 2011, pp. 1143-1151.
- (2011) Proc. Advances Neural Inf. Process. Syst. , pp. 1143-1151
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

17
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 207-218, 2014.
- (2014) Trans. Assoc. Comput. Linguistics , vol.2 , pp. 207-218
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

18
- 84878189119
- Collective generation of natural image descriptions
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi, "Collective generation of natural image descriptions," in Proc. 50th Annu. Meeting Assoc. Comput. Linguistics, 2012, pp. 359-368.
- (2012) Proc. 50th Annu. Meeting Assoc. Comput. Linguistics , pp. 359-368
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

19
- 84934873221
- Treetalk: Composition and compression of trees for image descriptions
- P. Kuznetsova, V. Ordonez, T. L. Berg, U. C. Hill, and Y. Choi, "Treetalk: Composition and compression of trees for image descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, no. 10, pp. 351-362, 2014.
- (2014) Trans. Assoc. Comput. Linguistics , vol.2 , Issue.10 , pp. 351-362
- Kuznetsova, P.¹ Ordonez, V.² Berg, T.L.³ Hill, U.C.⁴ Choi, Y.⁵

20
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi, "Composing simple image descriptions using web-scale n-grams," in Proc. 15th Conf. Comput. Natural Language Learn., 2011, pp. 220-228.
- (2011) Proc. 15th Conf. Comput. Natural Language Learn. , pp. 220-228
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

21
- 84919886586
- arXiv:1204.2742
- A. Barbu, et al., "Video in sentences out," arXiv:1204.2742, 2012.
- (2012) Video in Sentences Out
- Barbu, A.¹

22
- 84906929591
- Image description using visual dependency representations
- D. Elliott and F. Keller, "Image description using visual dependency representations," in Proc. Empirical Methods Natural Language Process., 2013, pp. 1292-1302.
- (2013) Proc. Empirical Methods Natural Language Process , pp. 1292-1302
- Elliott, D.¹ Keller, F.²

23
- 84973931408
- From image annotation to image description
- Berlin, Germany: Springer
- A. Gupta and P. Mannem, "From image annotation to image description," in Neural Information Processing. Berlin, Germany: Springer, 2012.
- (2012) Neural Information Processing
- Gupta, A.¹ Mannem, P.²

24
- 80053258778
- Corpusguided sentence generation of natural images
- Y. Yang, C. L. Teo, H. Daumé III, and Y. Aloimonos, "Corpusguided sentence generation of natural images," in Proc. Conf. Empirical Methods Natural Language Process., 2011, pp. 444-454.
- (2011) Proc. Conf. Empirical Methods Natural Language Process , pp. 444-454
- Yang, Y.¹ Teo, C.L.² Daumé, H.³ Aloimonos, Y.⁴

25
- 77954862144
- I2T: Image parsing to text description
- Aug.
- B. Z. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu, "I2T: Image parsing to text description," in Proc. IEEE, vol. 98, no. 8, pp. 1485-1508, Aug. 2010.
- (2010) Proc. IEEE , vol.98 , Issue.8 , pp. 1485-1508
- Yao, B.Z.¹ Yang, X.² Lin, L.³ Lee, M.W.⁴ Zhu, S.-C.⁵

26
- 85026937926
- See no evil, say no evil: Description generation from densely labeled images
- M. Yatskar, L. Vanderwende, and L. Zettlemoyer, "See no evil, say no evil: Description generation from densely labeled images," in Proc. 3rd Joint Conf. Lexical Comput. Semantics, 2014, pp. 110-120.
- (2014) Proc. 3rd Joint Conf. Lexical Comput. Semantics , pp. 110-120
- Yatskar, M.¹ Vanderwende, L.² Zettlemoyer, L.³

27
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- M. Mitchell, et al., "Midge: Generating image descriptions from computer vision detections," in Proc. 13th Conf. Eur. Chapter Assoc. Comput. Linguistics, 2012, pp. 747-756.
- (2012) Proc. 13th Conf. Eur. Chapter Assoc. Comput. Linguistics , pp. 747-756
- Mitchell, M.¹

28
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and L. Fei-Fei, "Deep fragment embeddings for bidirectional image sentence mapping," Advances in neural information processing systems, 2014, pp. 1889-1897.
- (2014) Advances in Neural Information Processing Systems , pp. 1889-1897
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

29
- 84929363334
- Multimodal neural languagemodels
- R. Kiros, R. S. Zemel, and R. Salakhutdinov, "Multimodal neural languagemodels," in Proc. Int. Conf. Mach. Learn., 2014, pp. 595-603.
- (2014) Proc. Int. Conf. Mach. Learn. , pp. 595-603
- Kiros, R.¹ Zemel, R.S.² Salakhutdinov, R.³

30
- 84944115859
- Learning a recurrent visual representation for image caption generation
- [Online]
- X. Chen and C. L. Zitnick, "Learning a recurrent visual representation for image caption generation," CoRR, 2014. [Online]. Available: http://arxiv.org/abs/1411.5654
- (2014) CoRR
- Chen, X.¹ Zitnick, C.L.²

31
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, et al., "Long-term recurrent convolutional networks for visual recognition and description," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 2625-2634.
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 2625-2634
- Donahue, J.¹

32
- 84959250180
- From captions to visual concepts and back
- H. Fang, et al., "From captions to visual concepts and back," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 1473-1482.
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 1473-1482
- Fang, H.¹

33
- 84951072975
- arXiv:1410.1090
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, "Explain images with multimodal recurrent neural networks," arXiv:1410.1090, 2014.
- (2014) Explain Images with Multimodal Recurrent Neural Networks
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

34
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3156-3164.
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

35
- 5044236741
- Names and faces in the news
- T. L. Berg "Names and faces in the news," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2004, vol. 2, pp. II-848-II-854.
- (2004) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , vol.2 , pp. 848-854
- Berg, T.L.¹

36
- 84911370987
- What are you talking about? Text-to-image coreference
- C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler, "What are you talking about? text-to-image coreference," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 351-362.
- (2014) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 351-362
- Kong, C.¹ Lin, D.² Bansal, M.³ Urtasun, R.⁴ Fidler, S.⁵

37
- 84911442106
- Visual semantic search: Retrieving videos via complex textual queries
- D. Lin, S. Fidler, C. Kong, and R. Urtasun, "Visual semantic search: Retrieving videos via complex textual queries," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 2657-2667.
- (2014) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 2657-2667
- Lin, D.¹ Fidler, S.² Kong, C.³ Urtasun, R.⁴

38
- 84867118595
- A joint model of language and perception for grounded attribute learning
- Jun.
- C. Matuszek, N. FitzGerald, L. Zettlemoyer, L. Bo, and D. Fox, "A joint model of language and perception for grounded attribute learning," in Proc. 29th Int. Conf. Mach. Learn., Jun. 2012, pp. 1671-1678.
- (2012) Proc. 29th Int. Conf. Mach. Learn. , pp. 1671-1678
- Matuszek, C.¹ FitzGerald, N.² Zettlemoyer, L.³ Bo, L.⁴ Fox, D.⁵

39
- 84898772194
- Learning the visual interpretation of sentences
- C. L. Zitnick, D. Parikh, and L. Vanderwende, "Learning the visual interpretation of sentences," in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 1681-1688.
- (2013) Proc. IEEE Int. Conf. Comput. Vis. , pp. 1681-1688
- Zitnick, C.L.¹ Parikh, D.² Vanderwende, L.³

40
- 84973911532
- Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
- Y. Zhu, et al., "Aligning books and movies: Towards story-like visual explanations by watching movies and reading books," in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 19-27.
- (2015) Proc. IEEE Int. Conf. Comput. Vis. , pp. 19-27
- Zhu, Y.¹

41
- 84898958665
- Devise: A deep visual-semantic embedding model
- A. Frome, et al., "Devise: A deep visual-semantic embedding model," in Proc. Advances Neural Inf. Process. Syst., 2013, pp. 2121-2129.
- (2013) Proc. Advances Neural Inf. Process. Syst. , pp. 2121-2129
- Frome, A.¹

42
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and F. F. F. Li, "Deep fragment embeddings for bidirectional image sentence mapping," in Proc. Advances Neural Inf. Process. Syst., 2014, pp. 1889-1897.
- (2014) Proc. Advances Neural Inf. Process. Syst. , pp. 1889-1897
- Karpathy, A.¹ Joulin, A.² Li, F.F.F.³

43
- 84876231242
- ImageNet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proc. Advances Neural Inf. Process. Syst., 2012, pp. 1097-1105.
- (2012) Proc. Advances Neural Inf. Process. Syst. , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

44
- 0032203257
- Gradient-based learning applied to document recognition
- Nov.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
- (1998) Proc. IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

45
- 84894576734
- Neural probabilistic language models
- Berlin, Germany: Springer
- Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain, "Neural probabilistic language models," in Innovations in Machine Learning. Berlin, Germany: Springer, 2006.
- (2006) Innovations in Machine Learning
- Bengio, Y.¹ Schwenk, H.² Senécal, J.-S.³ Morin, F.⁴ Gauvain, J.-L.⁵

46
- 84961289992
- Glove: Global vectors for word representation
- R. Socher, J. Pennington, and C. Manning, "Glove: Global vectors for word representation," in Proc. Empirical Methods Natural Language Process., 2014, pp. 1532-1543.
- (2014) Proc. Empirical Methods Natural Language Process , pp. 1532-1543
- Socher, R.¹ Pennington, J.² Manning, C.³

47
- 84898956512
- Distributed representations of words and phrases and their compositionality
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Proc. Advances Neural Inf. Process. Syst., 2013, pp. 3111-3119.
- (2013) Proc. Advances Neural Inf. Process. Syst. , pp. 3111-3119
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

48
- 0142166851
- A neural probabilistic language model
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin, "A neural probabilistic language model," J. Mach. Learn. Res., vol. 3, pp. 1137-1155, 2003.
- (2003) J. Mach. Learn. Res. , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Janvin, C.⁴

49
- 79959829092
- Recurrent neural network based language model
- T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur, "Recurrent neural network based language model," in Proc. 11th Annu. Conf. Int. Speech Commun. Assoc., 2010, pp. 1045-1048.
- (2010) Proc. 11th Annu. Conf. Int. Speech Commun. Assoc. , pp. 1045-1048
- Mikolov, T.¹ Karafiát, M.² Burget, L.³ Cernockỳ, J.⁴ Khudanpur, S.⁵

50
- 80053459857
- Generating text with recurrent neural networks
- I. Sutskever, J. Martens, and G. E. Hinton, "Generating text with recurrent neural networks," in Proc. 28th Int. Conf. Mach. Learn., 2011, pp. 1017-1024.
- (2011) Proc. 28th Int. Conf. Mach. Learn. , pp. 1017-1024
- Sutskever, I.¹ Martens, J.² Hinton, G.E.³

51
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2014, pp. 580-587.
- (2014) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 580-587
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

52
- 85198028989
- ImageNet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2009, pp. 248-255.
- (2009) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

53
- 0031268931
- Bidirectional recurrent neural networks
- Nov.
- M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673-2681, Nov. 1997.
- (1997) IEEE Trans. Signal Process , vol.45 , Issue.11 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.K.²

54
- 84935113569
- Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
- Apr.
- A. J. Viterbi, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Trans. Inf. Theory, vol. TIT-13, no. 2, pp. 260-269, Apr. 1967.
- (1967) IEEE Trans. Inf. Theory , vol.TIT-13 , Issue.2 , pp. 260-269
- Viterbi, A.J.¹

55
- 26444565569
- Finding structure in time
- J. L. Elman, "Finding structure in time," Cogn. Science, vol. 14, no. 2, pp. 179-211, 1990.
- (1990) Cogn. Science , vol.14 , Issue.2 , pp. 179-211
- Elman, J.L.¹

56
- 0003407429
- Berlin, Germany: Springer
- W. Zhang, State-Space Search: Algorithms, Complexity, Extensions, and Applications. Berlin, Germany: Springer, 1999.
- (1999) State-Space Search: Algorithms, Complexity, Extensions, and Applications
- Zhang, W.¹

57
- 84893343292
- Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude
- T. Tieleman and G. Hinton, "Lecture 6.5rmsprop: Divide the gradient by a running average of its recent magnitude," COURSERA: Neural Networks for Machine Learning, vol. 4, no. 2, 2012.
- (2012) COURSERA: Neural Networks for Machine Learning , vol.4 , Issue.2
- Tieleman, T.¹ Hinton, G.²

58
- 84943546021
- T. Tieleman and G. E. Hinton, "Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude," 2012.
- (2012) Lecture 6.5-RmsProp: Divide the Gradient by A Running Average of Its Recent Magnitude
- Tieleman, T.¹ Hinton, G.E.²

59
- 84978730111
- [Online]
- R. Krishna, et al., "Visual genome: Connecting language and vision using crowdsourced dense image annotations," 2016. [Online]. Available: http://arxiv.org/abs/1602.07332
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹

60
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

61
- 84925410541
- arXiv:1409.1556
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

62
- 84937522268
- Going deeper with convolutions
- C. Szegedy, et al., "Going deeper with convolutions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 1-9.
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 1-9
- Szegedy, C.¹

63
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, "BLEU: A method for automatic evaluation of machine translation," in Proc. 40th Annu. Meeting Assoc. Comput. Linguistics, 2002, pp. 311-318.
- (2002) Proc. 40th Annu. Meeting Assoc. Comput. Linguistics , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

64
- 84926007060
- METEOR universal: Language specific translation evaluation for any target language
- M. Denkowski and A. Lavie, "METEOR universal: Language specific translation evaluation for any target language," in Proc. 9th Workshop Statistical Mach. Transl., 2014, pp. 67-78
- (2014) Proc. 9th Workshop Statistical Mach. Transl. , pp. 67-78
- Denkowski, M.¹ Lavie, A.²

65
- 84956980995
- CIDEr: Consensus-based image description evaluation
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh, "CIDEr: Consensus-based image description evaluation," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 4566-4575.
- (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog. , pp. 4566-4575
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

66
- 84952349295
- arXiv:1504.00325
- X. Chen, et al., "Microsoft Coco captions: Data collection and evaluation server," arXiv:1504.00325, 2015.
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹

67
- 84965102873
- arXiv:1505.04467
- J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring nearest neighbor approaches for image captioning," arXiv:1505.04467, 2015.
- (2015) Exploring Nearest Neighbor Approaches for Image Captioning
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Zitnick, C.L.⁵

68
- 84888340666
- Torch7: A Matlablike environment for machine learning
- R. Collobert, K. Kavukcuoglu, and C. Farabet, "Torch7: A Matlablike environment for machine learning," in Proc. Big Learn Advances Neural Inf. Process. Syst. Workshop, 2011, pp. 1681-1688.
- (2011) Proc. Big Learn Advances Neural Inf. Process. Syst. Workshop , pp. 1681-1688
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.