SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 2015-January, Issue , 2015, Pages 2953-2961

Exploring models and data for image question answering

(3) Ren, Mengye a Kiros, Ryan a Zemel, Richard S a,b

a UNIVERSITY OF TORONTO (Canada)

b CANADIAN INSTITUTE FOR ADVANCED RESEARCH (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

INFORMATION SCIENCE; OBJECT DETECTION; SEMANTICS;

BASELINE RESULTS; EMBEDDINGS; GENERATION ALGORITHM; IMAGE DESCRIPTIONS; IMAGE-BASED; INTERMEDIATE STAGE; QUESTION ANSWERING; VISUAL SEMANTICS;

IMAGE SEGMENTATION;

EID: 84965170394 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (705)

References (32)

1
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator", In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

2
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel, "Unifying visual-semantic embeddings with multimodal neural language models", TACL, 2015.
- (2015) TACL
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

3
- 84959252592
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and L. Fei-Fei, "Deep fragment embeddings for bidirectional image sentence mapping", In NIPS, 2013.
- (2013) NIPS
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

4
- 84951072975
- Explain images with multimodal recurrent neural networks
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, "Explain images with multimodal recurrent neural networks", NIPS Deep Learning Workshop, 2014.
- (2014) NIPS Deep Learning Workshop
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

5
- 84944046597
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description", In CVPR, 2014.
- (2014) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

6
- 84944115859
- CoRR, vol. abs/1411.5654
- X. Chen and C. L. Zitnick, "Learning a recurrent visual representation for image caption generation", CoRR, vol. abs/1411.5654, 2014.
- (2014) Learning a Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

7
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig, "From captions to visual concepts and back", In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.N.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰ Zitnick, C.L.¹¹ Zweig, G.¹²

8
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention", In ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.C.⁵ Salakhutdinov, R.⁶ Zemel, R.S.⁷ Bengio, Y.⁸

9
- 84970028761
- Phrase-based image captioning
- R. Lebret, P. O. Pinheiro, and R. Collobert, "Phrase-based image captioning", In ICML, 2015.
- (2015) ICML
- Lebret, R.¹ Pinheiro, P.O.² Collobert, R.³

10
- 84965125568
- Fisher vectors derived from hybrid Gaussian-Laplacian mixture models for image annotations
- B. Klein, G. Lev, G. Lev, and L. Wolf, "Fisher vectors derived from hybrid Gaussian-Laplacian mixture models for image annotations", In CVPR, 2015.
- (2015) CVPR
- Klein, B.¹ Lev, G.² Lev, G.³ Wolf, L.⁴

11
- 84951975735
- Towards a visual Turing challenge
- M. Malinowski and M. Fritz, "Towards a visual Turing challenge", In NIPS Workshop on Learning Semantics, 2014.
- (2014) NIPS Workshop on Learning Semantics
- Malinowski, M.¹ Fritz, M.²

12
- 84881536861
- Indoor segmentation and support inference from RGBD images
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGBD images", In ECCV, 2012.
- (2012) ECCV
- Silberman, N.¹ Hoiem, D.² Kohli, P.³ Fergus, R.⁴

13
- 84959502295
- CoRR, vol. abs/1505.00468
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, "VQA: Visual Question Answering", CoRR, vol. abs/1505.00468, 2015.
- (2015) VQA: Visual Question Answering
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.L.⁶ Parikh, D.⁷

14
- 84957035520
- CoRR, vol. abs/1505.01121
- M. Malinowski, M. Rohrbach, and M. Fritz, "Ask Your Neurons: A Neural-based Approach to Answering Questions about Images", CoRR, vol. abs/1505.01121, 2015.
- (2015) Ask Your Neurons: A Neural-based Approach to Answering Questions About Images
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

15
- 84957033954
- CoRR, vol. abs/1505.05612
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu, "Are you talking to a machine? dataset and methods for multilingual image question answering", CoRR, vol. abs/1505.05612, 2015.
- (2015) Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

16
- 84957021783
- CoRR, vol. abs/1506.00333
- L. Ma, Z. Lu, and H. Li, "Learning to answer questions from image using convolutional neural network", CoRR, vol. abs/1506.00333, 2015.
- (2015) Learning to Answer Questions from Image Using Convolutional Neural Network
- Ma, L.¹ Lu, Z.² Li, H.³

17
- 84937834115
- Microsoft COCO: Common objects in context
- T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context", In ECCV, 2014.
- (2014) ECCV
- Lin, T.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

18
- 84952349295
- CoRR, vol. abs/1504.00325
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, "Microsoft COCO captions: Data collection and evaluation server", CoRR, vol. abs/1504.00325, 2015.
- (2015) Microsoft COCO Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

19
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory", Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

20
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

21
- 84947041871
- Imagenet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and L. Fei-Fei, "Imagenet large scale visual recognition challenge", IJCV, 2015.
- (2015) IJCV
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.S.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

22
- 85083951332
- Efficient estimation of word representations in vector space
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space", In ICLR, 2013.
- (2013) ICLR
- Mikolov, T.¹ Chen, K.² Corrado, G.³ Dean, J.⁴

23
- 84898958665
- DeViSE: A deep visual-semantic embedding model
- A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov, "DeViSE: A deep visual-semantic embedding model", In NIPS, 2013.
- (2013) NIPS
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Ranzato, M.⁶ Mikolov, T.⁷

24
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics", J. Artif. Intell. Res. (JAIR), vol. 47, pp. 853-899, 2013.
- (2013) J. Artif. Intell. Res. (JAIR) , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

25
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg, "Im2text: Describing images using 1 million captioned photographs", In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

26
- 85146417759
- Accurate unlexicalized parsing
- D. Klein and C. D. Manning, "Accurate unlexicalized parsing", In ACL, 2003.
- (2003) ACL
- Klein, D.¹ Manning, C.D.²

27
- 84858398242
- New York: Academic Press
- N. Chomsky, Conditions on Transformations. New York: Academic Press, 1973.
- (1973) Conditions on Transformations
- Chomsky, N.¹

28
- 0004289791
- Cambridge, MA; London: The MIT Press, May
- C. Fellbaum, Ed., WordNet An Electronic Lexical Database. Cambridge, MA; London: The MIT Press, May 1998.
- (1998) WordNet an Electronic Lexical Database
- Fellbaum, C.¹

29
- 85107362379
- NLTK: The natural language toolkit
- S. Bird, "NLTK: the natural language toolkit", In ACL, 2006.
- (2006) ACL
- Bird, S.¹

30
- 84965102873
- CoRR, vol. abs/1505.04467
- J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring nearest neighbor approaches for image captioning", CoRR, vol. abs/1505.04467, 2015.
- (2015) Exploring Nearest Neighbor Approaches for Image Captioning
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Zitnick, C.L.⁵

31
- 85146676791
- Verb semantics and lexical selection
- Z. Wu and M. Palmer, "Verb semantics and lexical selection", In ACL, 1994.
- (1994) ACL
- Wu, Z.¹ Palmer, M.²

32
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- M. Malinowski and M. Fritz, "A multi-world approach to question answering about real-world scenes based on uncertain input", In NIPS, 2014.
- (2014) NIPS
- Malinowski, M.¹ Fritz, M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.