SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 10115 LNCS, Issue , 2017, Pages 101-117

Phi-LSTM: A phrase-based hierarchical LSTM model for image captioning

(2) Tan, Ying Hua a Chan, Chee Seng a

a UNIVERSITY OF MALAYA (Malaysia)

Author keywords

[No Author keywords available]

Indexed keywords

IMAGE ANALYSIS; LONG SHORT-TERM MEMORY; NEURAL NETWORKS; OBJECT DETECTION;

CONVOLUTIONAL NEURAL NETWORK; IMAGE CAPTIONING; IMAGE DESCRIPTIONS; IMAGE FEATURES; NOUN PHRASE; STATE OF THE ART; VISUAL SCENE;

COMPUTER VISION;

EID: 85016280072 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-319-54193-8_7 Document Type: Conference Paper

Times cited : (32)

References (39)

1
- 80052889458
- Recognition using visual phrases
- Sadeghi, M.A., Farhadi, A.: Recognition using visual phrases. In: CVPR, pp. 1745– 1752 (2011)
- (2011) CVPR
- Sadeghi, M.A.¹ Farhadi, A.²

2
- 84868289993
- Choosing linguistics over vision to describe images
- Gupta, A., Verma, Y., Jawahar, C.: Choosing linguistics over vision to describe images. In: AAAI, pp. 606–612 (2012)
- (2012) AAAI , pp. 606-612
- Gupta, A.¹ Verma, Y.² Jawahar, C.³

3
- 84960130911
- Automatic description generation from images: A survey of models, datasets, and evaluation measures
- Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., Keller, F., Muscat, A., Plank, B.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)
- (2016) J. Artif. Intell. Res. , vol.55 , pp. 409-442
- Bernardi, R.¹ Cakici, R.² Elliott, D.³ Erdem, A.⁴ Erdem, E.⁵ Ikizler-Cinbis, N.⁶ Keller, F.⁷ Muscat, A.⁸ Plank, B.⁹

4
- 78650967345
- A new approach to cross-modal multimedia retrieval
- Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: ACMMM, pp. 251–260 (2010)
- (2010) ACMMM , pp. 251-260
- Rasiwasia, N.¹ Costa Pereira, J.² Coviello, E.³ Doyle, G.⁴ Lanckriet, G.R.⁵ Levy, R.⁶ Vasconcelos, N.⁷

5
- 84951072975
- Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv preprint arXiv:1412.6632 (2014)
- (2014) Deep Captioning with Multimodal Recurrent Neural Networks (M-RNN). Arxiv Preprint Arxiv , vol.1412 , pp. 6632
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

6
- 84946747440
- Show and tell: A neural image caption generator
- Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164 (2015)
- (2015) CVPR , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

7
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR, pp. 3128–3137 (2015)
- (2015) CVPR , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

8
- 84946802533
- Kiros, R., Salakhutdinov, R., Zemel, R.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)
- (2014) Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. Arxiv Preprint Arxiv , vol.1411 , pp. 2539
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

9
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)
- (2015) CVPR , pp. 2625-2634
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

10
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2048–2057(2015)
- (2015) Proceedings of the 32Nd International Conference on Machine Learning (ICML 2015 , pp. 2048-2057
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhudinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

11
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
- (2012) NIPS , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.³

12
- 0031573117
- Long short-term memory
- Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
- (1997) Neural Comput , vol.9 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

13
- 0000754012
- A model and an hypothesis for language structure
- Yngve, V.: A model and an hypothesis for language structure. Proc. Am. Philos. Soc. 104, 444–466 (1960)
- (1960) Proc. Am. Philos. Soc. , vol.104 , pp. 444-466
- Yngve, V.¹

14
- 84994130334
- Tai, K.S., Socher, R., Manning, C.: Improved semantic representations from treestructured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
- (2015) Improved Semantic Representations from Treestructured Long Short-Term Memory Networks. Arxiv Preprint Arxiv , vol.1503 , pp. 00075
- Tai, K.S.¹ Socher, R.² Manning, C.³

15
- 85090348677
- Collecting image annotations using Amazon’s mechanical turk
- Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: NAACL: Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 139–147 (2010)
- (2010) NAACL: Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk , pp. 139-147
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

16
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
- (2014) Trans. Assoc. Comput. Linguist. , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

17
- 84970028761
- Phrase-based image captioning
- Lebret, R., Pinheiro, P.H., Collobert, R.: Phrase-based image captioning. In: International Conference on Machine Learning (ICML). Number EPFL-CONF-210021 (2015)
- (2015) International Conference on Machine Learning (ICML). Number EPFL-CONF-210021
- Lebret, R.¹ Pinheiro, P.H.² Collobert, R.³

18
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)
- (2013) J. Artif. Intell. Res. , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

19
- 84898958665
- Devise: A deep visual-semantic embedding model
- Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)
- (2013) NIPS , pp. 2121-2129
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

20
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- Socher, R., Karpathy, A., Le, Q.V., Manning, C.D., Ng, A.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)
- (2014) Trans. Assoc. Comput. Linguist. , vol.2 , pp. 207-218
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.⁵

21
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping. In: NIPS, pp. 1889–1897 (2014)
- (2014) NIPS , pp. 1889-1897
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

22
- 84877724347
- Multimodal learning with deep Boltzmann machines
- Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS, pp. 2222–2230 (2012)
- (2012) NIPS , pp. 2222-2230
- Srivastava, N.¹ Salakhutdinov, R.²

23
- 84856653718
- Learning cross-modality similarity for multinomial data
- Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: ICCV, pp. 2407–2414 (2011)
- (2011) ICCV , pp. 2407-2414
- Jia, Y.¹ Salzmann, M.² Darrell, T.³

24
- 84929363334
- Multimodal neural language models
- Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: ICML, pp. 595–603 (2014)
- (2014) ICML , pp. 595-603
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

25
- 84934873221
- Treetalk: Composition and compression of trees for image descriptions
- Kuznetsova, P., Ordonez, V., Berg, T., Choi, Y.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Computat. Linguist. 2, 351– 362 (2014)
- (2014) Trans. Assoc. Computat. Linguist. , vol.2
- Kuznetsova, P.¹ Ordonez, V.² Berg, T.³ Choi, Y.⁴

26
- 78149311145
- Every picture tells a story: Generating sentences from images
- Daniilidis, K., Maragos, P., Paragios, N., Springer, Heidelberg
- Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1 2
- (2010) ECCV 2010. LNCS , vol.6314 , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

27
- 84887601544
- Babytalk: Understanding and generating simple image descriptions
- Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A., Berg, T.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2891–2903 (2013)
- (2013) IEEE Trans. Pattern Anal. Mach. Intell. , vol.35 , pp. 2891-2903
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.⁷ Berg, T.⁸

28
- 80053258778
- Corpus-guided sentence generation of natural images
- Yang, Y., Teo, C.L., Daumé III, H., Aloimonos, Y.: Corpus-guided sentence generation of natural images. In: EMNLP, pp. 444–454 (2011)
- (2011) EMNLP , pp. 444-454
- Yang, Y.¹ Teo, C.L.² Daumé, H.³ Aloimonos, Y.⁴

29
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A., Yamaguchi, K., Berg, T., Stratos, K., Daumé III, H.: Midge: generating image descriptions from computer vision detections. In: EACL, pp. 747–756 (2012)
- (2012) EACL , pp. 747-756
- Mitchell, M.¹ Han, X.² Dodge, J.³ Mensch, A.⁴ Goyal, A.⁵ Berg, A.⁶ Yamaguchi, K.⁷ Berg, T.⁸ Stratos, K.⁹ Daumé, H.¹⁰

30
- 84869018122
- From image annotation to image description
- Huang, T., Zeng, Z., Li, C., Leung, C.S., Springer, Heidelberg
- Gupta, A., Mannem, P.: From image annotation to image description. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7667, pp. 196–204. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34500-5 24
- (2012) ICONIP 2012. LNCS , vol.7667 , pp. 196-204
- Gupta, A.¹ Mannem, P.²

31
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- Li, S., Kulkarni, G., Berg, T., Berg, A., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: CoNLL, pp. 220–228 (2011)
- (2011) Conll , pp. 220-228
- Li, S.¹ Kulkarni, G.² Berg, T.³ Berg, A.⁴ Choi, Y.⁵

32
- 84878189119
- Collective generation of natural image descriptions
- Kuznetsova, P., Ordonez, V., Berg, A., Berg, T., Choi, Y.: Collective generation of natural image descriptions. In: ACL, pp. 359–368 (2012)
- (2012) ACL , pp. 359-368
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.³ Berg, T.⁴ Choi, Y.⁵

33
- 84944115859
- Chen, X., Zitnick, L.: Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654 (2014)
- (2014) Learning a Recurrent Visual Representation for Image Caption Generation. Arxiv Preprint Arxiv , vol.1411 , pp. 5654
- Chen, X.¹ Zitnick, L.²

34
- 85117622017
- The Stanford CoreNLP natural language processing toolkit
- Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)
- (2014) ACL , pp. 55-60
- Manning, C.¹ Surdeanu, M.² Bauer, J.³ Finkel, J.⁴ Bethard, S.⁵ McClosky, D.⁶

35
- 84988536699
- Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
- (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. Arxiv Preprint Arxiv , vol.1409 , pp. 1556
- Simonyan, K.¹ Zisserman, A.²

36
- 85198028989
- Imagenet: A large-scale hierarchical image database
- Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
- (2009) CVPR , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.J.⁴ Li, K.⁵ Fei-Fei, L.⁶

37
- 85016240817
- Lecture 6a overview of mini-batch gradient descent
- Hinton, G., Srivastava, N., Swersky, K.: Lecture 6a overview of mini-batch gradient descent (2012). Coursera Lecture slides https://class.coursera.org/neuralnets-2012-001/lecture
- (2012) Coursera Lecture Slides
- Hinton, G.¹ Srivastava, N.² Swersky, K.³

38
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
- (2014) J. Mach. Learn. Res. , vol.15 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

39
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
- (2002) ACL , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.