SCOPUS 정보 검색 플랫폼

British Machine Vision Conference 2016, BMVC 2016

Volumn 2016-September, Issue , 2016, Pages 141.1-141.13

Oracle performance for visual captioning

(5) Yao, Li a Ballas, Nicolas a Cho, Kyunghyun c Smith, John R b Bengio, Yoshua a

a UNIVERSITÉ DE MONTRÉAL (Canada)

b IBM T J WATSON RESEARCH CENTER (United States)

c NEW YORK UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION;

CURRENT MODELS; HIGH QUALITY; LANGUAGE MODEL; NATURAL LANGUAGES; STATE OF THE ART; TRAINING PROCEDURES; VISUAL CONCEPT; VISUAL MODEL;

VISUAL LANGUAGES;

EID: 85046865379 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.5244/C.30.141 Document Type: Conference Paper

Times cited : (5)

References (40)

1
- 85083954507
- Delving deeper into convolu-tional networks for learning video representations
- Nicolas Ballas, Li Yao, Chris Pal, and Aaron Courville. Delving deeper into convolu-tional networks for learning video representations. ICLR, 2016.
- (2016) ICLR
- Ballas, N.¹ Yao, L.² Pal, C.³ Courville, A.⁴

2
- 84885996388
- Video in sentences out
- UAI
- A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, et al. Video in sentences out. UAI, 2012.
- (2012)
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰

3
- 85011805705
- arXiv preprint
- Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. arXiv preprint arXiv:1506.03099, 2015.
- (2015) Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
- Bengio, S.¹ Vinyals, O.² Jaitly, N.³ Shazeer, N.⁴

4
- 0142166851
- A neural probabilistic language model
- Yoshua Bengio, R jean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137–1155, 2003.
- (2003) The Journal of Machine Learning Research , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.J.² Vincent, P.³ Janvin, C.⁴

5
- 84857855190
- Random search for hyper-parameter optimization
- James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. JMLR, 2012.
- (2012) JMLR
- Bergstra, J.¹ Bengio, Y.²

6
- 84952349295
- Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, and C Lawrence Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv 1504.00325, 2015.
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Lawrence Zitnick, C.⁷

7
- 84961291190
- Learning phrase representations using rnn encoder-decoder for statistical machine translation
- Kyunghyun Cho, Bart Van Merri nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014.
- (2014) EMNLP
- Cho, K.¹ Van Merri nboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

8
- 84952349296
- arXiv preprint
- Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geoffrey Zweig, and Margaret Mitchell. Language models for image captioning: The quirks and what works. arXiv preprint arXiv:1505.01809, 2015.
- (2015) Language Models for Image Captioning: The Quirks and What Works
- Devlin, J.¹ Cheng, H.² Fang, H.³ Gupta, S.⁴ Deng, L.⁵ He, X.⁶ Zweig, G.⁷ Mitchell, M.⁸

9
- 84965102873
- arXiv preprint
- Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, and C Lawrence Zitnick. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467, 2015.
- (2015) Exploring Nearest Neighbor Approaches for Image Captioning
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Lawrence Zitnick, C.⁵

10
- 84959236502
- Long-term recurrent convo-lutional networks for visual recognition and description
- Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. Long-term recurrent convo-lutional networks for visual recognition and description. CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

11
- 84959250180
- From captions to visual concepts and back
- Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Doll r, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, et al. From captions to visual concepts and back. CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollr, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

12
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
- Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, 2013.
- (2013) ICCV
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

13
- 84867720412
- arXiv preprint
- Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
- (2012) Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors
- Hinton, G.E.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.R.⁵

14
- 0031573117
- Long short-term memory
- Sepp Hochreiter and J rgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.R.²

15
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- Micah Hodosh, Peter Young, and Julia Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 2013.
- (2013) Journal of Artificial Intelligence Research
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

16
- 84994636831
- arXiv preprint
- Xu Jia, Efstratios Gavves, Basura Fernando, and Tinne Tuytelaars. Guiding long-short term memory for image caption generation. arXiv preprint arXiv:1509.04942, 2015.
- (2015) Guiding Long-Short Term Memory for Image Caption Generation
- Jia, X.¹ Gavves, E.² Fernando, B.³ Tuytelaars, T.⁴

17
- 84952902559
- Deep visual-semantic alignments for generating image descriptions
- A Karpathy and L Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

18
- 84944113729
- arXiv preprint
- Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539, 2014.
- (2014) Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

19
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- Atsuhiro Kojima, Takeshi Tamura, and Kunio Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. IJCV, 2002.
- (2002) IJCV
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

20
- 84887601544
- Babytalk: Understanding and generating simple image descriptions
- Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. Babytalk: Understanding and generating simple image descriptions. PAMI, 2013.
- (2013) PAMI
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.L.⁸

21
- 84878189119
- Collective generation of natural image descriptions
- Association for Computational Linguistics
- Polina Kuznetsova, Vicente Ordonez, Alexander C Berg, Tamara L Berg, and Yejin Choi. Collective generation of natural image descriptions. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 359–368. Association for Computational Linguistics, 2012.
- (2012) Proceedings of The 50th Annual Meeting of The Association for Computational Linguistics: Long Papers-Volume , vol.1 , pp. 359-368
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

22
- 84906493406
- Microsoft coco: Common objects in context
- Springer
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ra-manan, Piotr Doll r, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014, pages 740–755. Springer, 2014.
- (2014) Computer Vision–ECCV 2014 , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ra-Manan, D.⁶ Dollr, P.⁷ Lawrence Zitnick, C.⁸

23
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). ICLR, 2015.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.⁵

24
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- Association for Computational Linguistics
- Margaret Mitchell, Xufeng Han, Jesse Dodge, Alyssa Mensch, Amit Goyal, Alex Berg, Kota Yamaguchi, Tamara Berg, Karl Stratos, and Hal Daum III. Midge: Generating image descriptions from computer vision detections. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 747–756. Association for Computational Linguistics, 2012.
- (2012) Proceedings of The 13th Conference of The European Chapter of The Association for Computational Linguistics , pp. 747-756
- Mitchell, M.¹ Han, X.² Dodge, J.³ Mensch, A.⁴ Goyal, A.⁵ Berg, A.⁶ Yamaguchi, K.⁷ Berg, T.⁸ Stratos, K.⁹ Daum, H.¹⁰

25
- 85028032121
- Qi Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, and Anthony Dick. What value high level concepts in vision to language problems? arXiv 1506.01144, 2015.
- (2015) What Value High Level Concepts in Vision to Language Problems?
- Wu, Q.Q.¹ Shen, C.² van den Hengel, A.³ Liu, L.⁴ Dick, A.⁵

26
- 84973887740
- Anna Rohrbach, Marcus Rohrbach, and Bernt Schiele. The long-short story of movie description. 2015.
- (2015) The Long-Short Story of Movie Description
- Rohrbach, A.¹ Rohrbach, M.² Schiele, B.³

27
- 84959211977
- A dataset for movie description
- Anna Rohrbach, Marcus Rohrbach, Niket Tandon, and Bernt Schiele. A dataset for movie description. CVPR, 2015.
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

28
- 84898775239
- Translating video content to natural language descriptions
- Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, and Bernt Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
- (2013) ICCV
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

29
- 84979557463
- arXiv e-prints, abs/1605.02688, May
- Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016. URL http://arxiv.org/abs/1605.02688.
- (2016) Theano: A Python Framework for Fast Computation of Mathematical Expressions

30
- 84959246420
- Atousa Torabi, Christopher Pal, Hugo Larochelle, and Aaron Courville. Using descriptive video services to create a large data source for video annotation research. arXiv: 1503.01070, 2015.
- (2015) Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
- Torabi, A.¹ Pal, C.² Larochelle, H.³ Courville, A.⁴

31
- 84956980995
- CIDEr: Consensus-based image description evaluation
- Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. CIDEr: Consensus-based image description evaluation. CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

32
- 84973882730
- Sequence to sequence – Video to text
- Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. Sequence to sequence – video to text. In ICCV, 2015.
- (2015) ICCV
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

33
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, and Kate Saenko. Translating videos to natural language using deep recurrent neural networks. NAACL, 2015.
- (2015) NAACL
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

34
- 85044451662
- Show and tell: A neural image caption generator
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. CVPR, 2014.
- (2014) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

35
- 85015998428
- Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, and Kate Saenko. A multi-scale multiple instance video description network. arXiv 1505.05914, 2015.
- (2015) A Multi-Scale Multiple Instance Video Description Network
- Xu, H.¹ Venugopalan, S.² Ramanishka, V.³ Rohrbach, M.⁴ Saenko, K.⁵

36
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

37
- 84973884896
- Describing videos by exploiting temporal structure
- Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville. Describing videos by exploiting temporal structure. In ICCV, 2015.
- (2015) ICCV
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

38
- 84906494296
- ACL14
- Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. ACL14.
- From Image Descriptions to Visual Denotations: New Similarity Metrics for Semantic Inference over Event Descriptions
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

39
- 84990820289
- Haonan Yu, Jiang Wang, Zhiheng Huang, Yi Yang, and Wei Xu. Video paragraph captioning using hierarchical recurrent neural networks. arXiv 1510.07712, 2015.
- (2015) Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
- Yu, H.¹ Wang, J.² Huang, Z.³ Yang, Y.⁴ Xu, W.⁵

40
- 84969736572
- Technical report
- Matthew D. Zeiler. ADADELTA: an adaptive learning rate method. Technical report, 2012.
- (2012) ADADELTA: An Adaptive Learning Rate Method
- Zeiler, M.D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.