SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 1080-1089

Visual dialog

(8) Das, Abhishek a Kottur, Satwik b Gupta, Khushi b Singh, Avi c Yadav, Deshraj d Moura, José M F b Parikh, Devi a Batra, Dhruv a

a GEORGIA INSTITUTE OF TECHNOLOGY (United States)

b CARNEGIE MELLON UNIVERSITY (United States)

c UNIVERSITY OF CALIFORNIA (United States)

d VIRGINIA POLYTECHNIC INSTITUTE AND STATE UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; DECODING; PATTERN RECOGNITION; SIGNAL ENCODING;

DATA COLLECTION PROTOCOLS; ENCODER-DECODER; EVALUATION PROTOCOL; HUMAN PERFORMANCE; MACHINE INTELLIGENCE; MEAN RECIPROCAL RANKS; OBJECTIVE EVALUATION; QUESTION-ANSWER PAIRS;

VISUAL LANGUAGES;

EID: 85041927710 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.121 Document Type: Conference Paper

Times cited : (888)

References (64)

1
- 85044300992
- Torch. http://torch.ch/.

2
- 85072836458
- Sort story: Sorting jumbled images and captions into stories
- H. Agrawal, A. Chandrasekaran, D. Batra, D. Parikh, and M. Bansal. Sort story: Sorting jumbled images and captions into stories. In EMNLP, 2016.
- (2016) EMNLP
- Agrawal, H.¹ Chandrasekaran, A.² Batra, D.³ Parikh, D.⁴ Bansal, M.⁵

3
- 85044317065
- Amazon. Alexa. http://alexa.amazon.com/.

4
- 84973890960
- VQA: Visual question answering
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual Question Answering. In ICCV, 2015.
- (2015) ICCV
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.L.⁶ Parikh, D.⁷

5
- 78649587763
- VizWiz: Nearly real-time answers to visual questions
- J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White, S. White, and T. Yeh. VizWiz: Nearly Real-time Answers to Visual Questions. In UIST, 2010.
- (2010) UIST
- Bigham, J.P.¹ Jayant, C.² Ji, H.³ Little, G.⁴ Miller, A.⁵ Miller, R.C.⁶ Miller, R.⁷ Tatarowicz, A.⁸ White, B.⁹ White, S.¹⁰ Yeh, T.¹¹

6
- 84980333689
- arXiv preprint
- A. Bordes, N. Usunier, S. Chopra, and J. Weston. Large-scale Simple Question Answering with Memory Networks. arXiv preprint arXiv:1506.02075, 2015.
- (2015) Large-scale Simple Question Answering with Memory Networks
- Bordes, A.¹ Usunier, N.² Chopra, S.³ Weston, J.⁴

7
- 85021638992
- arXiv preprint
- A. Bordes and J. Weston. Learning End-to-End Goal-Oriented Dialog. arXiv preprint arXiv:1605.07683, 2016.
- (2016) Learning End-to-End Goal-Oriented Dialog
- Bordes, A.¹ Weston, J.²

8
- 85072845519
- Resolving language and vision ambiguities together: Joint segmentation and prepositional attachment resolution in captioned scenes
- G. Christie, A. Laddha, A. Agrawal, S. Antol, Y. Goyal, K. Kochersberger, and D. Batra. Resolving language and vision ambiguities together: Joint segmentation and prepositional attachment resolution in captioned scenes. In EMNLP, 2016.
- (2016) EMNLP
- Christie, G.¹ Laddha, A.² Agrawal, A.³ Antol, S.⁴ Goyal, Y.⁵ Kochersberger, K.⁶ Batra, D.⁷

9
- 85072846928
- Human attention in visual question answering: Do humans and deep networks look at the same regions?
- A. Das, H. Agrawal, C. L. Zitnick, D. Parikh, and D. Batra. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? In EMNLP, 2016.
- (2016) EMNLP
- Das, A.¹ Agrawal, H.² Zitnick, C.L.³ Parikh, D.⁴ Batra, D.⁵

10
- 85041919303
- GuessWhat?! Visual object discovery through multi-modal dialogue
- H. de Vries, F. Strub, S. Chandar, O. Pietquin, H. Larochelle, and A. C. Courville. GuessWhat?! Visual object discovery through multi-modal dialogue. In CVPR, 2017.
- (2017) CVPR
- De Vries, H.¹ Strub, F.² Chandar, S.³ Pietquin, O.⁴ Larochelle, H.⁵ Courville, A.C.⁶

11
- 85083950683
- Evaluating prerequisite qualities for learning end-to-end dialog systems
- J. Dodge, A. Gane, X. Zhang, A. Bordes, S. Chopra, A. Miller, A. Szlam, and J. Weston. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems. In ICLR, 2016.
- (2016) ICLR
- Dodge, J.¹ Gane, A.² Zhang, X.³ Bordes, A.⁴ Chopra, S.⁵ Miller, A.⁶ Szlam, A.⁷ Weston, J.⁸

12
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

13
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. N. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From Captions to Visual Concepts and Back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.N.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰ Zitnick, C.L.¹¹ Zweig, G.¹²

14
- 84965148420
- Are you talking to a machine? Dataset and methods for multilingual image question answering
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering. In NIPS, 2015.
- (2015) NIPS
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

15
- 84973873525
- A visual turing test for computer vision systems
- D. Geman, S. Geman, N. Hallonquist, and L. Younes. A Visual Turing Test for Computer Vision Systems. In PNAS, 2014.
- (2014) PNAS
- Geman, D.¹ Geman, S.² Hallonquist, N.³ Younes, L.⁴

16
- 85041900002
- Making the v in vqa matter: Elevating the role of image understanding in visual question answering
- Y. Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In CVPR, 2017.
- (2017) CVPR
- Goyal, Y.¹ Khot, T.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

17
- 84986274465
- Deep residual learning for image recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In CVPR, 2016.
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

18
- 84965139942
- Teaching machines to read and comprehend
- K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom. Teaching machines to read and comprehend. In NIPS, 2015.
- (2015) NIPS
- Hermann, K.M.¹ Kocisky, T.² Grefenstette, E.³ Espeholt, L.⁴ Kay, W.⁵ Suleyman, M.⁶ Blunsom, P.⁷

19
- 85030448950
- Segmentation from natural language expressions
- R. Hu, M. Rohrbach, and T. Darrell. Segmentation from natural language expressions. In ECCV, 2016.
- (2016) ECCV
- Hu, R.¹ Rohrbach, M.² Darrell, T.³

20
- 84994137684
- Visual storytelling
- T.-H. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. Girshick, X. He, P. Kohli, D. Batra, L. Zitnick, D. Parikh, L. Vanderwende, M. Galley, and M. Mitchell. Visual storytelling. In NAACL HLT, 2016.
- (2016) NAACL HLT
- Huang, T.-H.¹ Ferraro, F.² Mostafazadeh, N.³ Misra, I.⁴ Agrawal, A.⁵ Devlin, J.⁶ Girshick, R.⁷ He, X.⁸ Kohli, P.⁹ Batra, D.¹⁰ Zitnick, L.¹¹ Parikh, D.¹² Vanderwende, L.¹³ Galley, M.¹⁴ Mitchell, M.¹⁵

21
- 85041926703
- Revisiting visual question answering baselines
- A. Jabri, A. Joulin, and L. van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016.
- (2016) ECCV
- Jabri, A.¹ Joulin, A.² Van Der Maaten, L.³

22
- 84984985591
- Smart reply: Automated response suggestion for email
- A. Kannan, K. Kurach, S. Ravi, T. Kaufmann, A. Tomkins, B. Miklos, G. Corrado, L. Lukács, M. Ganea, P. Young, et al. Smart Reply: Automated Response Suggestion for Email. In KDD, 2016.
- (2016) KDD
- Kannan, A.¹ Kurach, K.² Ravi, S.³ Kaufmann, T.⁴ Tomkins, A.⁵ Miklos, B.⁶ Corrado, G.⁷ Lukács, L.⁸ Ganea, M.⁹ Young, P.¹⁰

23
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

24
- 84911370987
- What are you talking about? Text-to-image coreference
- C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-to-image coreference. In CVPR, 2014.
- (2014) CVPR
- Kong, C.¹ Lin, D.² Bansal, M.³ Urtasun, R.⁴ Fidler, S.⁵

25
- 84893350028
- An ISU dialogue system exhibiting reinforcement learning of dialogue policies: Generic slot-filling in the TALK in-car system
- O. Lemon, K. Georgila, J. Henderson, and M. Stuttle. An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system. In EACL, 2006.
- (2006) EACL
- Lemon, O.¹ Georgila, K.² Henderson, J.³ Stuttle, M.⁴

26
- 85029377314
- Deep reinforcement learning for dialogue generation
- J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky. Deep Reinforcement Learning for Dialogue Generation. In EMNLP, 2016.
- (2016) EMNLP
- Li, J.¹ Monroe, W.² Ritter, A.³ Galley, M.⁴ Gao, J.⁵ Jurafsky, D.⁶

27
- 84937834115
- Microsoft COCO: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

28
- 85072827450
- How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation
- C.-W. Liu, R. Lowe, I. V. Serban, M. Noseworthy, L. Charlin, and J. Pineau. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In EMNLP, 2016.
- (2016) EMNLP
- Liu, C.-W.¹ Lowe, R.² Serban, I.V.³ Noseworthy, M.⁴ Charlin, L.⁵ Pineau, J.⁶

29
- 85011302702
- SSD: Single shot MultiBox detector
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single Shot MultiBox Detector. In ECCV, 2016.
- (2016) ECCV
- Liu, W.¹ Anguelov, D.² Erhan, D.³ Szegedy, C.⁴ Reed, S.⁵ Fu, C.-Y.⁶ Berg, A.C.⁷

30
- 84988430909
- The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems
- R. Lowe, N. Pow, I. Serban, and J. Pineau. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. In SIGDIAL, 2015.
- (2015) SIGDIAL
- Lowe, R.¹ Pow, N.² Serban, I.³ Pineau, J.⁴

31
- 85018930392
- J. Lu, X. Lin, D. Batra, and D. Parikh. Deeper LSTM and Normalized CNN Visual Question Answering model. https://github.com/VT-vision-lab/VQA-LSTM-CNN, 2015.
- (2015) Deeper LSTM and Normalized CNN Visual Question Answering Model
- Lu, J.¹ Lin, X.² Batra, D.³ Parikh, D.⁴

32
- 85018917850
- Hierarchical question-image co-attention for visual question answering
- J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS, 2016.
- (2016) NIPS
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

33
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- M. Malinowski and M. Fritz. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In NIPS, 2014.
- (2014) NIPS
- Malinowski, M.¹ Fritz, M.²

34
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
- (2015) ICCV
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

35
- 85007207124
- Listen, attend, and walk: Neural mapping of navigational instructions to action sequences
- H. Mei, M. Bansal, and M. R. Walter. Listen, attend, and walk: Neural mapping of navigational instructions to action sequences. In AAAI, 2016.
- (2016) AAAI
- Mei, H.¹ Bansal, M.² Walter, M.R.³

36
- 84924051598
- Human-level control through deep reinforcement learning
- 02
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 02 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

37
- 85041918797
- arXiv preprint
- N. Mostafazadeh, C. Brockett, B. Dolan, M. Galley, J. Gao, G. P. Spithourakis, and L. Vanderwende. Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation. arXiv preprint arXiv:1701.08251, 2017.
- (2017) Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
- Mostafazadeh, N.¹ Brockett, C.² Dolan, B.³ Galley, M.⁴ Gao, J.⁵ Spithourakis, G.P.⁶ Vanderwende, L.⁷

38
- 37949010080
- Empirical methods for evaluating dialog systems
- T. Paek. Empirical methods for evaluating dialog systems. In Proceedings of the workshop on Evaluation for Language and Dialogue Systems-Volume 9, 2001.
- (2001) Proceedings of the Workshop on Evaluation for Language and Dialogue Systems-Volume , vol.9
- Paek, T.¹

39
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
- B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In ICCV, 2015.
- (2015) ICCV
- Plummer, B.A.¹ Wang, L.² Cervantes, C.M.³ Caicedo, J.C.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

40
- 85071396128
- SQuAD: 100,000+ questions for machine comprehension of text
- P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In EMNLP, 2016.
- (2016) EMNLP
- Rajpurkar, P.¹ Zhang, J.² Lopyrev, K.³ Liang, P.⁴

41
- 84943782750
- Linking people with "their" names using coreference resolution
- V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking people with "their" names using coreference resolution. In ECCV, 2014.
- (2014) ECCV
- Ramanathan, V.¹ Joulin, A.² Liang, P.³ Fei-Fei, L.⁴

42
- 85072826753
- Question relevance in VQA: Identifying non-visual and false-premise questions
- A. Ray, G. Christie, M. Bansal, D. Batra, and D. Parikh. Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions. In EMNLP, 2016.
- (2016) EMNLP
- Ray, A.¹ Christie, G.² Bansal, M.³ Batra, D.⁴ Parikh, D.⁵

43
- 84965170394
- Exploring models and data for image question answering
- M. Ren, R. Kiros, and R. Zemel. Exploring Models and Data for Image Question Answering. In NIPS, 2015.
- (2015) NIPS
- Ren, M.¹ Kiros, R.² Zemel, R.³

44
- 84990024294
- Grounding of textual phrases in images by reconstruction
- A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele. Grounding of textual phrases in images by reconstruction. In ECCV, 2016.
- (2016) ECCV
- Rohrbach, A.¹ Rohrbach, M.² Hu, R.³ Darrell, T.⁴ Schiele, B.⁵

45
- 84959211977
- A dataset for movie description
- A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele. A dataset for movie description. In CVPR, 2015.
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

46
- 85011954479
- Generating factoid questions with recurrent neural networks: The 30M factoid question-answer corpus
- I. V. Serban, A. García-Durán, Ç. Gülçehre, S. Ahn, S. Chandar, A. C. Courville, and Y. Bengio. Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus. In ACL, 2016.
- (2016) ACL
- Serban, I.V.¹ García-Durán, A.² Gülçehre, C.³ Ahn, S.⁴ Chandar, S.⁵ Courville, A.C.⁶ Bengio, Y.⁷

47
- 84980367197
- Building end-to-end dialogue systems using generative hierarchical neural network models
- I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models. In AAAI, 2016.
- (2016) AAAI
- Serban, I.V.¹ Sordoni, A.² Bengio, Y.³ Courville, A.⁴ Pineau, J.⁵

48
- 85030483080
- arXiv preprint
- I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues. arXiv preprint arXiv:1605.06069, 2016.
- (2016) A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
- Serban, I.V.¹ Sordoni, A.² Lowe, R.³ Charlin, L.⁴ Pineau, J.⁵ Courville, A.⁶ Bengio, Y.⁷

49
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

50
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

51
- 84986296727
- MovieQA: Understanding stories in movies through question-answering
- M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler. MovieQA: Understanding Stories in Movies through Question-Answering. In CVPR, 2016.
- (2016) CVPR
- Tapaswi, M.¹ Zhu, Y.² Stiefelhagen, R.³ Torralba, A.⁴ Urtasun, R.⁵ Fidler, S.⁶

52
- 84901405262
- Joint video and text parsing for understanding events and answering queries
- K. Tu, M. Meng, M. W. Lee, T. E. Choe, and S. C. Zhu. Joint Video and Text Parsing for Understanding Events and Answering Queries. IEEE MultiMedia, 2014.
- (2014) IEEE MultiMedia
- Tu, K.¹ Meng, M.² Lee, M.W.³ Choe, T.E.⁴ Zhu, S.C.⁵

53
- 84973882730
- Sequence to sequence - Video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko. Sequence to Sequence - Video to Text. In ICCV, 2015.
- (2015) ICCV
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.J.⁴ Darrell, T.⁵ Saenko, K.⁶

54
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating Videos to Natural Language Using Deep Recurrent Neural Networks. In NAACL HLT, 2015.
- (2015) NAACL HLT
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.J.⁵ Saenko, K.⁶

55
- 84980377939
- arXiv preprint
- O. Vinyals and Q. Le. A Neural Conversational Model. arXiv preprint arXiv:1506.05869, 2015.
- (2015) A Neural Conversational Model
- Vinyals, O.¹ Le, Q.²

56
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

57
- 85013362186
- arXiv preprint
- L. Wang, S. Guo, W. Huang, Y. Xiong, and Y. Qiao. Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs. arXiv preprint arXiv:1610.01119, 2016.
- (2016) Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs
- Wang, L.¹ Guo, S.² Huang, W.³ Xiong, Y.⁴ Qiao, Y.⁵

58
- 85044310149
- J. Weizenbaum. ELIZA. http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm.
- ELIZA
- Weizenbaum, J.¹

59
- 85083951707
- Towards AI-complete question answering: A set of prerequisite toy tasks
- J. Weston, A. Bordes, S. Chopra, and T. Mikolov. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In ICLR, 2016.
- (2016) ICLR
- Weston, J.¹ Bordes, A.² Chopra, S.³ Mikolov, T.⁴

60
- 85041904521
- S. Wu, H. Pique, and J. Wieland. Using Artificial Intelligence to Help Blind People 'See' Facebook. http://newsroom.fb.com/news/2016/04/using-artificial-intelligence-to-help-blind-people-see-facebook/, 2016.
- (2016) Using Artificial Intelligence to Help Blind People 'See' Facebook
- Wu, S.¹ Pique, H.² Wieland, J.³

61
- 84986334021
- Stacked attention networks for image question answering
- Z. Yang, X. He, J. Gao, L. Deng, and A. J. Smola. Stacked Attention Networks for Image Question Answering. In CVPR, 2016.
- (2016) CVPR
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.J.⁵

62
- 84986278354
- Yin and yang: Balancing and answering binary visual questions
- P. Zhang, Y. Goyal, D. Summers-Stay, D. Batra, and D. Parikh. Yin and Yang: Balancing and Answering Binary Visual Questions. In CVPR, 2016.
- (2016) CVPR
- Zhang, P.¹ Goyal, Y.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

63
- 84986275767
- Visual7W: Grounded question answering in images
- Y. Zhu, O. Groth, M. Bernstein, and L. Fei-Fei. Visual7W: Grounded Question Answering in Images. In CVPR, 2016.
- (2016) CVPR
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

64
- 85018934522
- Measuring machine intelligence through visual question answering
- C. L. Zitnick, A. Agrawal, S. Antol, M. Mitchell, D. Batra, and D. Parikh. Measuring machine intelligence through visual question answering. AI Magazine, 2016.
- (2016) AI Magazine
- Zitnick, C.L.¹ Agrawal, A.² Antol, S.³ Mitchell, M.⁴ Batra, D.⁵ Parikh, D.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.