SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 3125-3134

Comprehension-guided referring expressions

(2) Luo, Ruotian a Shakhnarovich, Gregory a

a TOYOTA TECHNOLOGICAL INSTITUTE AT CHICAGO (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION;

BENCHMARK DATASETS; COMPREHENSION TASKS; HUMAN EVALUATION; IMAGE CAPTIONING; NATURAL LANGUAGES; REFERRING EXPRESSIONS; STANDARD EVALUATIONS; TRAINING SIGNAL;

QUALITY CONTROL;

EID: 85041910212 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.33 Document Type: Conference Paper

Times cited : (193)

References (38)

1
- 85040916427
- 1604.00562V1
- J. Andreas and D. Klein. Reasoning About Pragmatics with Neural Listeners and Speakers. 1604.00562V1, 2016.
- (2016) Reasoning about Pragmatics with Neural Listeners and Speakers
- Andreas, J.¹ Klein, D.²

2
- 85018920030
- arXiv:1607.07086v1 [cs.LG]
- D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe, J. Pineau, A. Courville, and Y. Bengio. An Actor-Critic Algorithm for Sequence Prediction. arXiv:1607.07086v1 [cs.LG], 2016.
- (2016) An Actor-Critic Algorithm for Sequence Prediction
- Bahdanau, D.¹ Brakel, P.² Xu, K.³ Goyal, A.⁴ Lowe, R.⁵ Pineau, J.⁶ Courville, A.⁷ Bengio, Y.⁸

3
- 84965179228
- Scheduled sampling for sequence prediction with recurrent neural networks
- S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pages 1171-1179, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 1171-1179
- Bengio, S.¹ Vinyals, O.² Jaitly, N.³ Shazeer, N.⁴

4
- 85021662196
- ArXiV
- B. Dhingra, H. Liu, W. W. Cohen, and R. Salakhutdinov. Gated-Attention Readers for Text Comprehension. ArXiV, 2016.
- (2016) Gated-Attention Readers for Text Comprehension
- Dhingra, B.¹ Liu, H.² Cohen, W.W.³ Salakhutdinov, R.⁴

5
- 84959933549
- Neural Machine Translation by Jointly Learning to Align and Translate
- Dzmitry Bahdana, D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation By Jointly Learning To Align and Translate. Iclr 2015, pages 1-15, 2014.
- (2014) Iclr 2015 , pp. 1-15
- Bahdana, D.¹ Bahdanau, D.² Cho, K.³ Bengio, Y.⁴

6
- 77649188328
- The segmented and annotated IAPR TC-12 benchmark
- H. J. Escalante, C. A. Hernández, J. A. Gonzalez, A. López- López, M. Montes, E. F. Morales, L. Enrique Sucar, L. Villase ~nor, and M. Grubinger. The segmented and annotated IAPR TC-12 benchmark. Computer Vision and Image Understanding, 114(4):419-428, 2010.
- (2010) Computer Vision and Image Understanding , vol.114 , Issue.4 , pp. 419-428
- Escalante, H.J.¹ Hernández, C.A.² Gonzalez, J.A.³ López-López, A.⁴ Montes, M.⁵ Morales, E.F.⁶ Enrique Sucar, L.⁷ Villase~nor, L.⁸ Grubinger, M.⁹

7
- 84898958665
- Devise: A deep visual-semantic embedding model
- A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems, 2013.
- (2013) Advances in Neural Information Processing Systems
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

8
- 84990060711
- Arxiv
- A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. Arxiv, 2016.
- (2016) Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
- Fukui, A.¹ Park, D.H.² Yang, D.³ Rohrbach, A.⁴ Darrell, T.⁵ Rohrbach, M.⁶

9
- 84964588182
- Fast r-cnn
- R. Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1440-1448, 2015.
- (2015) Proceedings of the IEEE International Conference on Computer Vision , pp. 1440-1448
- Girshick, R.¹

10
- 84937849144
- Generative adversarial nets
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672-2680, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 2672-2680
- Goodfellow, I.¹ Pouget-Abadie, J.² Mirza, M.³ Xu, B.⁴ Warde-Farley, D.⁵ Ozair, S.⁶ Courville, A.⁷ Bengio, Y.⁸

11
- 84943527827
- arXiv preprint arXiv: 1412.6572
- I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- (2014) Explaining and Harnessing Adversarial Examples
- Goodfellow, I.J.¹ Shlens, J.² Szegedy, C.³

12
- 84890543083
- Speech recognition with deep recurrent neural networks. 2013
- IEEE
- A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645-6649. IEEE, 2013.
- (2013) IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 6645-6649
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.³

13
- 38049183286
- The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems
- M. Grübinger, P. Clough, H. Müller, and T. Deselaers. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems. LREC Workshop OntoImage Language Resources for Content-Based Image Retrieval, pages 13-23, 2006.
- (2006) LREC Workshop OntoImage Language Resources for Content-Based Image Retrieval , pp. 13-23
- Grübinger, M.¹ Clough, P.² Müller, H.³ Deselaers, T.⁴

14
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

15
- 84986305787
- arXiv preprint
- R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell. Natural Language Object Retrieval. arXiv preprint, pages 4555-4564, 2015.
- (2015) Natural Language Object Retrieval , pp. 4555-4564
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

16
- 84986278097
- arXiv preprint
- J. Johnson, A. Karpathy, and L. Fei-Fei. DenseCap: Fully Convolutional Localization Networks for Dense Captioning. arXiv preprint, 2015.
- (2015) DenseCap: Fully Convolutional Localization Networks for Dense Captioning
- Johnson, J.¹ Karpathy, A.² Fei-Fei, L.³

17
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on, pages 3128-3137, 2015.
- (2015) Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

18
- 84943540775
- ReferItGame: Referring to Objects in Photographs of Natural Scenes
- S. Kazemzadeh, V. Ordonez, M. Matten, and T. L. Berg. ReferItGame: Referring to Objects in Photographs of Natural Scenes. Emnlp, pages 787-798, 2014.
- (2014) Emnlp , pp. 787-798
- Kazemzadeh, S.¹ Ordonez, V.² Matten, M.³ Berg, T.L.⁴

19
- 84941620184
- arXiv preprint arXiv: 1412.6980
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

20
- 84906493406
- Microsoft coco: Common objects in context
- Springer
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740-755. Springer, 2014.
- (2014) European Conference on Computer Vision , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

21
- 85040306674
- C. Liu, J. Mao, F. Sha, and A. Yuille. Attention Correctness in Neural Image Captioning. pages 1-11, 2016.
- (2016) Attention Correctness in Neural Image Captioning , pp. 1-11
- Liu, C.¹ Mao, J.² Sha, F.³ Yuille, A.⁴

22
- 84973864182
- Multimodal convolutional neural networks for matching image and sentence
- Dece
- L. Ma, Z. Lu, L. Shang, and H. Li. Multimodal convolutional neural networks for matching image and sentence. Proceedings of the IEEE International Conference on Computer Vision, 11-18-Dece:2623-2631, 2016.
- (2016) Proceedings of the IEEE International Conference on Computer Vision , vol.11-18 , pp. 2623-2631
- Ma, L.¹ Lu, Z.² Shang, L.³ Li, H.⁴

23
- 84986260074
- Generation and Comprehension of Unambiguous Object Descriptions
- J. Mao, J. Huang, A. Toshev, O. Camburu, A. Yuille, and K. Murphy. Generation and Comprehension of Unambiguous Object Descriptions. Cvpr, pages 11-20, 2016.
- (2016) Cvpr , pp. 11-20
- Mao, J.¹ Huang, J.² Toshev, A.³ Camburu, O.⁴ Yuille, A.⁵ Murphy, K.⁶

24
- 85083951332
- arXiv preprint arXiv: 1301.3781
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- (2013) Efficient Estimation of Word Representations in Vector Space
- Mikolov, T.¹ Chen, K.² Corrado, G.³ Dean, J.⁴

25
- 85021826252
- Modeling Context between Objects for Referring Expression Understanding
- V. K. Nagaraja, V. I. Morariu, and L. S. Davis. Modeling Context Between Objects for Referring Expression Understanding. Eccv, 2016.
- (2016) Eccv
- Nagaraja, V.K.¹ Morariu, V.I.² Davis, L.S.³

26
- 84978298377
- arXiv
- A. Radford, L. Metz, and S. Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv, pages 1-15, 2015.
- (2015) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , pp. 1-15
- Radford, A.¹ Metz, L.² Chintala, S.³

27
- 85083951479
- Sequence Level Training with Recurrent Neural Networks
- M. Ranzato, S. Chopra, M. Auli, andW. Zaremba. Sequence Level Training with Recurrent Neural Networks. Iclr, pages 1-15, 2016.
- (2016) Iclr , pp. 1-15
- Ranzato, M.¹ Chopra, S.² Auli, M.³ Zaremba, W.⁴

28
- 84986250442
- Learning Deep Representations of Fine-Grained Visual Descriptions
- S. Reed, Z. Akata, H. Lee, and B. Schiele. Learning Deep Representations of Fine-Grained Visual Descriptions. Cvpr, pages 49-58, 2016.
- (2016) Cvpr , pp. 49-58
- Reed, S.¹ Akata, Z.² Lee, H.³ Schiele, B.⁴

29
- 85044386408
- 1511.03745V1
- A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele. Grounding of Textual Phrases in Images by Reconstruction. 1511.03745V1, 1:1-10, 2015.
- (2015) Grounding of Textual Phrases in Images by Reconstruction , vol.1 , pp. 1-10
- Rohrbach, A.¹ Rohrbach, M.² Hu, R.³ Darrell, T.⁴ Schiele, B.⁵

30
- 84925410541
- arXiv preprint arXiv: 1409.1556
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

31
- 84992670816
- arXiv preprint, (2005)
- I. Vendrov, R. Kiros, S. Fidler, and R. Urtasun. Order- Embeddings of Images and Language. arXiv preprint, (2005):1-13, 2015.
- (2015) Order- Embeddings of Images and Language , pp. 1-13
- Vendrov, I.¹ Kiros, R.² Fidler, S.³ Urtasun, R.⁴

32
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156-3164, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

33
- 84986271102
- Learning Deep Structure- Preserving Image-Text Embeddings
- L. Wang, Y. Li, and S. Lazebnik. Learning Deep Structure- Preserving Image-Text Embeddings. Cvpr, (Figure 1):5005- 5013, 2016.
- (2016) Cvpr (Figure 1) , pp. 5005-5013
- Wang, L.¹ Li, Y.² Lazebnik, S.³

34
- 84867117593
- Wsabie: Scaling up to large vocabulary image annotation
- J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI, 2011.
- (2011) Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI
- Weston, J.¹ Bengio, S.² Usunier, N.³

35
- 84970002232
- Show Attend and Tell: Neural Image Caption Generation with Visual Attention
- K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Icml-2015, 2015.
- (2015) Icml-2015
- Xu, K.¹ Ba, J.L.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.S.⁷ Bengio, Y.⁸

36
- 84990061297
- Modeling Context in Referring Expressions
- L. Yu, P. Poirson, S. Yang, A. C. Berg, and T. L. Berg. Modeling Context in Referring Expressions. In Eccv, 2016.
- (2016) Eccv
- Yu, L.¹ Poirson, P.² Yang, S.³ Berg, A.C.⁴ Berg, T.L.⁵

37
- 85019049571
- L. Yu, W. Zhang, J. Wang, and Y. Yu. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. 2016.
- (2016) SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
- Yu, L.¹ Zhang, W.² Wang, J.³ Yu, Y.⁴

38
- 84952018709
- Edge boxes: Locating object proposals from edges
- September
- L. Zitnick and P. Dollar. Edge boxes: Locating object proposals from edges. In ECCV. European Conference on Computer Vision, September 2014.
- (2014) ECCV. European Conference on Computer Vision
- Zitnick, L.¹ Dollar, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.