SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 2015-January, Issue , 2015, Pages 2296-2304

Are you talking to a machine? Dataset and methods for multilingual image question answering

(6) Gao, Haoyuan a Mao, Junhua b Zhou, Jie a Huang, Zhiheng a Wang, Lei a Xu, Wei a

a BAIDU INC (China)

b UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; INFORMATION SCIENCE; LINGUISTICS; NEURAL NETWORKS; STATISTICAL TESTS;

CHINESE QUESTION; CONVOLUTIONAL NEURAL NETWORK; LONG SHORT TERM MEMORY; QUESTION ANSWERING; SINGLE WORDS; THREE COMPONENT; TURING TESTS; VISUAL REPRESENTATIONS;

QUALITY CONTROL;

EID: 84965148420 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (450)

References (44)

1
- 84959502295
- arXiv preprint arXiv:1505.00468
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. Vqa: Visual question answering. arXiv preprint arXiv:1505.00468, 2015.
- (2015) Vqa: Visual Question Answering
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.L.⁶ Parikh, D.⁷

2
- 78649587763
- Vizwiz: Nearly real-time answers to visual questions
- J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White, S. White, et al. Vizwiz: nearly real-time answers to visual questions. In ACM symposium on User interface software and technology, pages 333-342, 2010.
- (2010) ACM Symposium on User Interface Software and Technology , pp. 333-342
- Bigham, J.P.¹ Jayant, C.² Ji, H.³ Little, G.⁴ Miller, A.⁵ Miller, R.C.⁶ Miller, R.⁷ Tatarowicz, A.⁸ White, B.⁹ White, S.¹⁰

3
- 85083954148
- Semantic image segmentation with deep convolutional nets and fully connected crfs
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR, 2015.
- (2015) ICLR
- Chen, L.-C.¹ Papandreou, G.² Kokkinos, I.³ Murphy, K.⁴ Yuille, A.L.⁵

4
- 84957029470
- Learning a recurrent visual representation for image caption generation
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. In CVPR, 2015.
- (2015) CVPR
- Chen, X.¹ Zitnick, C.L.²

5
- 84919728106
- arXiv preprint arXiv:1406.1078
- K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- (2014) Learning Phrase Representations Using Rnn Encoderdecoder for Statistical Machine Translation
- Cho, K.¹ Van Merrienboer, B.² Gulcehre, C.³ Bougares, F.⁴ Schwenk, H.⁵ Bengio, Y.⁶

6
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

7
- 26444565569
- Finding structure in time
- J. L. Elman. Finding structure in time. Cognitive science, 14(2):179-211, 1990.
- (1990) Cognitive Science , vol.14 , Issue.2 , pp. 179-211
- Elman, J.L.¹

8
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

9
- 84925422907
- Visual turing test for computer vision systems
- D. Geman, S. Geman, N. Hallonquist, and L. Younes. Visual turing test for computer vision systems. PNAS, 112(12):3618-3623, 2015.
- (2015) PNAS , vol.112 , Issue.12 , pp. 3618-3623
- Geman, D.¹ Geman, S.² Hallonquist, N.³ Younes, L.⁴

10
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
- (2014) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

11
- 38049183286
- The iapr tc-12 benchmark: A new evaluation resource for visual information systems
- M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, pages 13-23, 2006.
- (2006) International Workshop OntoImage , pp. 13-23
- Grubinger, M.¹ Clough, P.² Müller, H.³ Deselaers, T.⁴

12
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

13
- 84926283798
- Recurrent continuous translation models
- N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. In EMNLP, pages 1700-1709, 2013.
- (2013) EMNLP , pp. 1700-1709
- Kalchbrenner, N.¹ Blunsom, P.²

14
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

15
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
- (2015) TACL
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

16
- 84944130628
- arXiv preprint arXiv:1411.7399
- B. Klein, G. Lev, G. Sadeh, and L. Wolf. Fisher vectors derived from hybrid gaussian-laplacian mixture models for image annotation. arXiv preprint arXiv:1411.7399, 2014.
- (2014) Fisher Vectors Derived from Hybrid Gaussian-laplacian Mixture Models for Image Annotation
- Klein, B.¹ Lev, G.² Sadeh, G.³ Wolf, L.⁴

17
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

18
- 85120046073
- Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgements
- Association for Computational Linguistics
- A. Lavie and A. Agarwal. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgements. In Workshop on Statistical Machine Translation, pages 228-231. Association for Computational Linguistics, 2007.
- (2007) Workshop on Statistical Machine Translation , pp. 228-231
- Lavie, A.¹ Agarwal, A.²

19
- 84951057248
- arXiv preprint arXiv:1412.8419
- R. Lebret, P. O. Pinheiro, and R. Collobert. Simple image description generator via a linear phrase-based approach. arXiv preprint arXiv:1412.8419, 2014.
- (2014) Simple Image Description Generator Via a Linear Phrase-based Approach
- Lebret, R.¹ Pinheiro, P.O.² Collobert, R.³

20
- 84872543023
- Efficient backprop
- Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller. Efficient backprop. In Neural networks: Tricks of the trade, pages 9-48. 2012.
- (2012) Neural Networks: Tricks of the Trade , pp. 9-48
- LeCun, Y.A.¹ Bottou, L.² Orr, G.B.³ Müller, K.-R.⁴

21
- 84906505935
- arXiv preprint arXiv:1405.0312
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. arXiv preprint arXiv:1405.0312, 2014.
- (2014) Microsoft Coco: Common Objects in Context
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

22
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in Neural Information Processing Systems, pages 1682-1690, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 1682-1690
- Malinowski, M.¹ Fritz, M.²

23
- 84957035520
- arXiv preprint arXiv:1505.01121
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. arXiv preprint arXiv:1505.01121, 2015.
- (2015) Ask Your Neurons: A Neural-based Approach to Answering Questions About Images
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

24
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

25
- 84965160495
- arXiv preprint arXiv:1504.06692
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. arXiv preprint arXiv:1504.06692, 2015.
- (2015) Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

26
- 84951072975
- Explain images with multimodal recurrent neural networks
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. NIPS DeepLearning Workshop, 2014.
- (2014) NIPS DeepLearning Workshop
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

27
- 84939804661
- arXiv preprint arXiv:1412.7753
- T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato. Learning longer memory in recurrent neural networks. arXiv preprint arXiv:1412.7753, 2014.
- (2014) Learning Longer Memory in Recurrent Neural Networks
- Mikolov, T.¹ Joulin, A.² Chopra, S.³ Mathieu, M.⁴ Ranzato, M.⁵

28
- 79959829092
- Recurrent neural network based language model
- T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, pages 1045-1048, 2010.
- (2010) INTERSPEECH , pp. 1045-1048
- Mikolov, T.¹ Karafiát, M.² Burget, L.³ Cernockỳ, J.⁴ Khudanpur, S.⁵

29
- 84898956512
- Distributed representations of words and phrases and their compositionality
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119, 2013.
- (2013) NIPS , pp. 3111-3119
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

30
- 77956509090
- Rectified linear units improve restricted boltzmann machines
- V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807-814, 2010.
- (2010) ICML , pp. 807-814
- Nair, V.¹ Hinton, G.E.²

31
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002.
- (2002) ACL , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

32
- 84962816362
- arXiv preprint arXiv:1505.02074
- M. Ren, R. Kiros, and R. Zemel. Image question answering: A visual semantic embedding model and a new dataset. arXiv preprint arXiv:1505.02074, 2015.
- (2015) Image Question Answering: A Visual Semantic Embedding Model and A New Dataset
- Ren, M.¹ Kiros, R.² Zemel, R.³

33
- 84909978410
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, 2014.
- (2014) ImageNet Large Scale Visual Recognition Challenge
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

34
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

35
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104-3112, 2014.
- (2014) NIPS , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

36
- 84964983441
- arXiv preprint arXiv:1409.4842
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.
- (2014) Going Deeper with Convolutions
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

37
- 84901405262
- Joint video and text parsing for understanding events and answering queries
- K. Tu, M. Meng, M. W. Lee, T. E. Choe, and S.-C. Zhu. Joint video and text parsing for understanding events and answering queries. MultiMedia, IEEE, 21(2):42-70, 2014.
- (2014) MultiMedia, IEEE , vol.21 , Issue.2 , pp. 42-70
- Tu, K.¹ Meng, M.² Lee, M.W.³ Choe, T.E.⁴ Zhu, S.-C.⁵

38
- 0002988210
- Computing machinery and intelligence
- A. M. Turing. Computing machinery and intelligence. Mind, pages 433-460, 1950.
- (1950) Mind , pp. 433-460
- Turing, A.M.¹

39
- 84956980995
- Cider: Consensus-based image description evaluation
- R. Vedantam, C. L. Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

40
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

41
- 85146676791
- Verbs semantics and lexical selection
- Z. Wu and M. Palmer. Verbs semantics and lexical selection. In ACL, pages 133-138, 1994.
- (1994) ACL , pp. 133-138
- Wu, Z.¹ Palmer, M.²

42
- 84939821074
- arXiv preprint arXiv:1502.03044
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

43
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In ACL, pages 479-488, 2014.
- (2014) ACL , pp. 479-488
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

44
- 84937851238
- Learning from weakly supervised data by the expectation loss SVM (e-SVM) algorithm
- J. Zhu, J. Mao, and A. L. Yuille. Learning from weakly supervised data by the expectation loss svm (e-svm) algorithm. In NIPS, pages 1125-1133, 2014.
- (2014) NIPS , pp. 1125-1133
- Zhu, J.¹ Mao, J.² Yuille, A.L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.