SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 4622-4630

Ask me anything: Free-form visual question answering based on knowledge from external sources

(5) Wu, Qi a Wang, Peng a Shen, Chunhua a Dick, Anthony a Van Den Hengel, Anton a

a UNIVERSITY OF ADELAIDE (Australia)

Author keywords

[No Author keywords available]

Indexed keywords

COMPLEX NETWORKS; COMPUTER VISION; KNOWLEDGE BASED SYSTEMS; PATTERN RECOGNITION; RECURRENT NEURAL NETWORKS; SEMANTICS;

COMBINED INFORMATIONS; COMPLEX QUESTIONS; INTERNAL REPRESENTATION; NATURAL LANGUAGES; NETWORK-BASED APPROACH; QUESTION ANSWERING; TEXTUAL INFORMATION; TEXTUAL REPRESENTATION;

NATURAL LANGUAGE PROCESSING SYSTEMS;

EID: 84986320870 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.500 Document Type: Conference Paper

Times cited : (413)

References (33)

1
- 84973890960
- VQA: Visual question answering
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual Question Answering. In Proc. IEEE Int. Conf. Comp. Vis., 2015.
- (2015) Proc. IEEE Int. Conf. Comp. Vis.
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.L.⁶ Parikh, D.⁷

2
- 70350086542
- Springer
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. Springer, 2007.
- (2007) Dbpedia: A Nucleus for A Web of Open Data
- Auer, S.¹ Bizer, C.² Kobilarov, G.³ Lehmann, J.⁴ Cyganiak, R.⁵ Ives, Z.⁶

3
- 84904308637
- Semantic parsing on freebase from question-answer pairs
- J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic Parsing on Freebase from Question-Answer Pairs. In Proc. Conf. Empirical Methods in Natural Language Processing, pages 1533-1544, 2013.
- (2013) Proc. Conf. Empirical Methods in Natural Language Processing , pp. 1533-1544
- Berant, J.¹ Chou, A.² Frostig, R.³ Liang, P.⁴

4
- 57149137628
- Freebase: A collaboratively created graph database for structuring human knowledge
- ACM
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247-1250. ACM, 2008.
- (2008) Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data , pp. 1247-1250
- Bollacker, K.¹ Evans, C.² Paritosh, P.³ Sturge, T.⁴ Taylor, J.⁵

5
- 84952349295
- arXiv:1504.00325
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. arXiv:1504.00325, 2015.
- (2015) Microsoft COCO Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

6
- 84957029470
- Learning a Recurrent Visual Representation for Image Caption Generation
- X. Chen and C. L. Zitnick. Learning a Recurrent Visual Representation for Image Caption Generation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Chen, X.¹ Zitnick, C.L.²

7
- 84961291190
- Learning phrase representations using rnn encoder-decoder for statistical machine translation
- K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proc. Conf. Empirical Methods in Natural Language Processing, 2014.
- (2014) Proc. Conf. Empirical Methods in Natural Language Processing
- Cho, K.¹ Van Merrienboer, B.² Gulcehre, C.³ Bougares, F.⁴ Schwenk, H.⁵ Bengio, Y.⁶

8
- 72449136144
- Imagenet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2009.
- (2009) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

9
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

10
- 79953685181
- Building Watson: An overview of the DeepQA project
- D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, et al. Building Watson: An overview of the DeepQA project. AI magazine, 31(3):59-79, 2010.
- (2010) AI Magazine , vol.31 , Issue.3 , pp. 59-79
- Ferrucci, D.¹ Brown, E.² Chu-Carroll, J.³ Fan, J.⁴ Gondek, D.⁵ Kalyanpur, A.A.⁶ Lally, A.⁷ Murdock, J.W.⁸ Nyberg, E.⁹ Prager, J.¹⁰

11
- 84965148420
- Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering. In Proc. Advances in Neural Inf. Process. Syst., 2015.
- (2015) Proc. Advances in Neural Inf. Process. Syst.
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

12
- 84925422907
- Visual Turing test for computer vision systems
- D. Geman, S. Geman, N. Hallonquist, and L. Younes. Visual Turing test for computer vision systems. Proceedings of the National Academy of Sciences, 112(12):3618-3623, 2015.
- (2015) Proceedings of the National Academy of Sciences , vol.112 , Issue.12 , pp. 3618-3623
- Geman, D.¹ Geman, S.² Hallonquist, N.³ Younes, L.⁴

13
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

14
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping. In Proc. Advances in Neural Inf. Process. Syst., 2014.
- (2014) Proc. Advances in Neural Inf. Process. Syst.
- Karpathy, A.¹ Joulin, A.² Li, F.F.³

15
- 84912054048
- arXiv preprint arXiv:1405.4053
- Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053, 2014.
- (2014) Distributed Representations of Sentences and Documents
- Le, Q.V.¹ Mikolov, T.²

16
- 84959227898
- Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks
- June
- X. Lin and D. Parikh. Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., June 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn
- Lin, X.¹ Parikh, D.²

17
- 84957021783
- arXiv:1506.00333
- L. Ma, Z. Lu, and H. Li. Learning to Answer Questions From Image using Convolutional Neural Network. arXiv:1506.00333, 2015.
- (2015) Learning to Answer Questions from Image Using Convolutional Neural Network
- Ma, L.¹ Lu, Z.² Li, H.³

18
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In Proc. Advances in Neural Inf. Process. Syst., pages 1682-1690, 2014.
- (2014) Proc. Advances in Neural Inf. Process. Syst. , pp. 1682-1690
- Malinowski, M.¹ Fritz, M.²

19
- 84951975735
- arXiv:1410.8027
- M. Malinowski and M. Fritz. Towards a Visual Turing Challenge. arXiv:1410.8027, 2014.
- (2014) Towards A Visual Turing Challenge
- Malinowski, M.¹ Fritz, M.²

20
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images. In Proc. IEEE Int. Conf. Comp. Vis., 2015.
- (2015) Proc. IEEE Int. Conf. Comp. Vis.
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

21
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). In Proc. Int. Conf. Learn. Representations, 2015.
- (2015) Proc. Int. Conf. Learn. Representations
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.⁵

22
- 84973900209
- arXiv:1503.00848, March
- J. Pont-Tuset, P. Arbeláez, J. Barron, F. Marques, and J. Malik. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. In arXiv:1503.00848, March 2015.
- (2015) Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation
- Pont-Tuset, J.¹ Arbeláez, P.² Barron, J.³ Marques, F.⁴ Malik, J.⁵

23
- 84962816362
- Image Question Answering: A Visual Semantic Embedding Model and a New Dataset
- M. Ren, R. Kiros, and R. Zemel. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset. In Proc. Advances in Neural Inf. Process. Syst., 2015.
- (2015) Proc. Advances in Neural Inf. Process. Syst.
- Ren, M.¹ Kiros, R.² Zemel, R.³

24
- 84959184467
- VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases
- June
- F. Sadeghi, S. K. Kumar Divvala, and A. Farhadi. VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., June 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn
- Sadeghi, F.¹ Kumar Divvala, S.K.² Farhadi, A.³

25
- 84925410541
- arXiv:1409.1556
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

26
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Inf. Process. Syst., 2014.
- (2014) Proc. Advances in Neural Inf. Process. Syst.
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

27
- 84901405262
- Joint video and text parsing for understanding events and answering queries
- K. Tu, M. Meng, M.W. Lee, T. E. Choe, and S.-C. Zhu. Joint video and text parsing for understanding events and answering queries. IEEE Trans. Multimedia, 21(2):42-70, 2014.
- (2014) IEEE Trans. Multimedia , vol.21 , Issue.2 , pp. 42-70
- Tu, K.¹ Meng, M.² Lee, M.W.³ Choe, T.E.⁴ Zhu, S.-C.⁵

28
- 84939821075
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2014.
- (2014) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

29
- 84938908409
- arXiv:1406.5726
- Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan. CNN: Single-label to multi-label. arXiv:1406.5726, 2014.
- (2014) CNN: Single-label to Multi-label
- Wei, Y.¹ Xia, W.² Huang, J.³ Ni, B.⁴ Dong, J.⁵ Zhao, Y.⁶ Yan, S.⁷

30
- 84986301177
- What Value Do Explicit High Level Concepts Have in Vision to Language Problems?
- Q. Wu, C. Shen, A. v. d. Hengel, L. Liu, and A. Dick. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2016.
- (2016) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Wu, Q.¹ Shen, C.² Hengel, A.V.D.³ Liu, L.⁴ Dick, A.⁵

31
- 85146676791
- Verbs semantics and lexical selection
- Z. Wu and M. Palmer. Verbs semantics and lexical selection. In Proc. Conf. Association for Computational Linguistics, 1994.
- (1994) Proc. Conf. Association for Computational Linguistics
- Wu, Z.¹ Palmer, M.²

32
- 84965160010
- arXiv:1502.08029
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. arXiv:1502.08029, 2015.
- (2015) Describing Videos by Exploiting Temporal Structure
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

33
- 84986248327
- arXiv:1507.05670
- Y. Zhu, C. Zhang, C. Ré, and L. Fei-Fei. Building a Largescale Multimodal Knowledge Base for Visual Question Answering. arXiv:1507.05670, 2015.
- (2015) Building A Largescale Multimodal Knowledge Base for Visual Question Answering
- Zhu, Y.¹ Zhang, C.² Ré, C.³ Fei-Fei, L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.