SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 4584-4593

Video paragraph captioning using hierarchical recurrent neural networks

(5) Yu, Haonan a Wang, Jiang c Huang, Zhiheng b Yang, Yi c Xu, Wei c

a PURDUE UNIVERSITY (United States)

b FACEBOOK (United States)

c BAIDU INC (China)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; PATTERN RECOGNITION;

BENCHMARK DATASETS; INITIAL STATE; RECURRENT NEURAL NETWORK (RNNS); STATE-OF-THE-ART METHODS; TEMPORAL AND SPATIAL; VISUAL ELEMENTS;

RECURRENT NEURAL NETWORKS;

EID: 84986275061 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.496 Document Type: Conference Paper

Times cited : (596)

References (57)

1
- 85083953689
- Neural machine translation by jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015.
- (2015) International Conference on Learning Representations
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- June
- S. Banerjee and A. Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65-72, June 2005.
- (2005) Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation And/or Summarization , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

3
- 84885996388
- Video in sentences out
- A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, N. Siddharth, D. Salvi, L. Schmidt, J. Shangguan, J. M. Siskind, J. Waggoner, S. Wang, J. Wei, Y. Yin, and Z. Zhang. Video in sentences out. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pages 102-112, 2012.
- (2012) Proceedings of the Conference on Uncertainty in Artificial Intelligence , pp. 102-112
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Siddharth, N.⁹ Salvi, D.¹⁰ Schmidt, L.¹¹ Shangguan, J.¹² Siskind, J.M.¹³ Waggoner, J.¹⁴ Wang, S.¹⁵ Wei, J.¹⁶ Yin, Y.¹⁷ Zhang, Z.¹⁸

4
- 84965179228
- Scheduled sampling for sequence prediction with recurrent neural networks
- S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pages 1171-1179, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 1171-1179
- Bengio, S.¹ Vinyals, O.² Jaitly, N.³ Shazeer, N.⁴

5
- 84961338285
- Question answering with subgraph embeddings
- A. Bordes, S. Chopra, and J. Weston. Question answering with subgraph embeddings. In Conference on Empirical Methods in Natural Language Processing, pages 615-620, 2014.
- (2014) Conference on Empirical Methods in Natural Language Processing , pp. 615-620
- Bordes, A.¹ Chopra, S.² Weston, J.³

6
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011), Portland, OR, June 2011.
- (2011) Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011), Portland, OR, June
- Chen, D.L.¹ Dolan, W.B.²

7
- 84986240725
- Microsoft COCO captions: Data collection and evaluation server
- abs/1504.00325
- X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. CoRR, abs/1504.00325, 2015.
- (2015) CoRR
- Chen, X.¹ Fang, H.² Lin, T.³ Vedantam, R.⁴ Gupta, S.⁵ Dollár, P.⁶ Zitnick, C.L.⁷

8
- 84957029470
- Learning a recurrent visual representation for image caption generation
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
- Chen, X.¹ Zitnick, C.L.²

9
- 84961291190
- Learning phrase representations using RNN encoder-decoder for statistical machine translation
- K. Cho, B. van Merrienboer, Ç Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing, 2014.
- (2014) Conference on Empirical Methods in Natural Language Processing
- Cho, K.¹ Van Merrienboer, B.² Gülçehre, Ç.³ Bougares, F.⁴ Schwenk, H.⁵ Bengio, Y.⁶

10
- 84887345951
- A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- P. Das, C. Xu, R. F. Doell, and J. J. Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2634-2641, 2013.
- (2013) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 2634-2641
- Das, P.¹ Xu, C.² Doell, R.F.³ Corso, J.J.⁴

11
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

12
- 26444565569
- Finding structure in time
- J. L. Elman. Finding structure in time. COGNITIVE SCIENCE, 14(2):179-211, 1990.
- (1990) COGNITIVE SCIENCE , vol.14 , Issue.2 , pp. 179-211
- Elman, J.L.¹

13
- 35248856569
- Two-frame motion estimation based on polynomial expansion
- G. Farnebäck. Two-frame motion estimation based on polynomial expansion. In Proceedings of the 13th Scandinavian Conference on Image Analysis, pages 363-370, 2003.
- (2003) Proceedings of the 13th Scandinavian Conference on Image Analysis , pp. 363-370
- Farnebäck, G.¹

14
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
- S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, T. D. R. Mooney, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV'13 Int. Conf. on Computer Vision 2013, December 2013.
- (2013) ICCV'13 Int. Conf. on Computer Vision 2013, December
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, T.D.R.⁵ Saenko, K.⁶

15
- 84867724463
- Automated textual descriptions for a wide range of video events with 48 human actions
- P. Hanckmann, K. Schutte, and G. J. Burghouts. Automated textual descriptions for a wide range of video events with 48 human actions. In Proceedings of the European Conference on Computer Vision Workshops and Demonstrations, pages 372-380, 2012.
- (2012) Proceedings of the European Conference on Computer Vision Workshops and Demonstrations , pp. 372-380
- Hanckmann, P.¹ Schutte, K.² Burghouts, G.J.³

16
- 0031573117
- Long short-term memory
- Nov.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735-1780, Nov. 1997.
- (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

17
- 84865584175
- Aggregating local image descriptors into compact codes
- Sept.
- H. Jegou, F. Perronnin, M. Douze, J. S&nchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 34(9):1704-1716, Sept. 2012.
- (2012) IEEE Trans. Pattern Anal. Mach. Intell. , vol.34 , Issue.9 , pp. 1704-1716
- Jegou, H.¹ Perronnin, F.² Douze, M.³ Pereznchez, P.⁴ Schmid, C.⁵

18
- 84926283798
- Recurrent continuous translation models
- N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. In Conference on Empirical Methods in Natural Language Processing, pages 1700-1709, 2013.
- (2013) Conference on Empirical Methods in Natural Language Processing , pp. 1700-1709
- Kalchbrenner, N.¹ Blunsom, P.²

19
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

20
- 84863029475
- Human focused video description
- M. U. G. Khan, L. Zhang, and Y. Gotoh. Human focused video description. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 1480-1487, 2011.
- (2011) Proceedings of the IEEE International Conference on Computer Vision Workshops , pp. 1480-1487
- Khan, M.U.G.¹ Zhang, L.² Gotoh, Y.³

21
- 84863075153
- Towards coherent natural language description of video streams
- M. U. G. Khan, L. Zhang, and Y. Gotoh. Towards coherent natural language description of video streams. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 664-671, 2011.
- (2011) Proceedings of the IEEE International Conference on Computer Vision Workshops , pp. 664-671
- Khan, M.U.G.¹ Zhang, L.² Gotoh, Y.³

22
- 84944113729
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. In NIPS Deep Learning Workshop, 2014.
- (2014) NIPS Deep Learning Workshop
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

23
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. International Journal of Computer Vision, 50(2):171-184, 2002.
- (2002) International Journal of Computer Vision , vol.50 , Issue.2 , pp. 171-184
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

24
- 84893398951
- Generating natural-language video descriptions using text-mined knowledge
- N. Krishnamoorthy, G. Malkarnenkar, R. J. Mooney, K. Saenko, and S. Guadarrama. Generating natural-language video descriptions using text-mined knowledge. In AAAI Conference on Artificial Intelligence, pages 541-547, 2013.
- (2013) AAAI Conference on Artificial Intelligence , pp. 541-547
- Krishnamoorthy, N.¹ Malkarnenkar, G.² Mooney, R.J.³ Saenko, K.⁴ Guadarrama, S.⁵

25
- 0001857994
- Efficient backprop
- Y. LeCun, L. Bottou, G. Orr, and K. Müller. Efficient backprop. In Neural Networks: Tricks of the Trade, page 546. 1998.
- (1998) Neural Networks: Tricks of the Trade, Page 546.
- LeCun, Y.¹ Bottou, L.² Orr, G.³ Müller, K.⁴

26
- 51849094354
- Save: A framework for semantic annotation of visual events
- M.W. Lee, A. Hakeem, N. Haering, and S.-C. Zhu. SAVE: A framework for semantic annotation of visual events. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1-8, 2008.
- (2008) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , pp. 1-8
- Lee, M.W.¹ Hakeem, A.² Haering, N.³ Zhu, S.-C.⁴

27
- 84943794827
- A hierarchical neural autoencoder for paragraphs and documents
- J. Li, M. Luong, and D. Jurafsky. A hierarchical neural autoencoder for paragraphs and documents. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1106-1115, 2015.
- (2015) Proceedings of the Annual Meeting of the Association for Computational Linguistics , pp. 1106-1115
- Li, J.¹ Luong, M.² Jurafsky, D.³

28
- 84959935599
- Hierarchical recurrent neural network for document modeling
- Sept.
- R. Lin, S. Liu, M. Yang, M. Li, M. Zhou, and S. Li. Hierarchical recurrent neural network for document modeling. pages 899-907. Conference on Empirical Methods in Natural Language Processing, Sept. 2015.
- (2015) Conference on Empirical Methods in Natural Language Processing , pp. 899-907
- Lin, R.¹ Liu, S.² Yang, M.³ Li, M.⁴ Zhou, M.⁵ Li, S.⁶

29
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). ICLR, 2015.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

30
- 84965160495
- J. Mao,W. Xu, Y. Yang, J.Wang, Z. Huang, and A. L. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. 2015.
- (2015) Learning Like A Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.L.⁶

31
- 77956509090
- Rectified linear units improve restricted boltzmann machines
- V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807-814, 2010.
- (2010) ICML , pp. 807-814
- Nair, V.¹ Hinton, G.E.²

32
- 85060437486
- Jointly modeling embedding and translation to bridge video and language
- abs/1505.01861
- Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui. Jointly modeling embedding and translation to bridge video and language. CoRR, abs/1505.01861, 2015.
- (2015) CoRR
- Pan, Y.¹ Mei, T.² Yao, T.³ Li, H.⁴ Rui, Y.⁵

33
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W. jing Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002.
- (2002) ACL , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Jing Zhu, W.⁴

34
- 84965149840
- Expressing an image stream with a sequence of natural sentences
- C. C. Park and G. Kim. Expressing an image stream with a sequence of natural sentences. In Advances in Neural Information Processing Systems, pages 73-81, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 73-81
- Park, C.C.¹ Kim, G.²

35
- 84994149921
- Sequence level training with recurrent neural networks
- M. Ranzato, S. Chopra, M. Auli, and W. Zaremba. Sequence level training with recurrent neural networks. CoRR, abs/1511.06732, 2015.
- (2015) CoRR, abs/1511.06732
- Ranzato, M.¹ Chopra, S.² Auli, M.³ Zaremba, W.⁴

36
- 84960170289
- Coherent multi-sentence video description with variable level of detail
- A. Rohrbach, M. Rohrbach,W. Qiu, A. Friedrich, M. Pinkal, and B. Schiele. Coherent multi-sentence video description with variable level of detail. In German Conference on Pattern Recognition (GCPR), September 2014.
- (2014) German Conference on Pattern Recognition (GCPR), September
- Rohrbach, A.¹ Rohrbach, M.² Qiu, W.³ Friedrich, A.⁴ Pinkal, M.⁵ Schiele, B.⁶

37
- 84898775239
- Translating video content to natural language descriptions
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In Proceedings of the IEEE International Conference on Computer Vision, pages 433-440, 2013.
- (2013) Proceedings of the IEEE International Conference on Computer Vision , pp. 433-440
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

38
- 84947041871
- Imagenet large scale visual recognition challenge
- Apr.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), pages 1-42, Apr. 2015.
- (2015) International Journal of Computer Vision (IJCV) , pp. 1-42
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

39
- 0031268931
- Bidirectional recurrent neural networks
- Nov.
- M. Schuster and K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673-2681, Nov. 1997.
- (1997) IEEE Transactions on Signal Processing , vol.45 , Issue.11 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.²

40
- 84925410541
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2014.
- (2014) International Conference on Learning Representations
- Simonyan, K.¹ Zisserman, A.²

41
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929-1958, 2014.
- (2014) Journal of Machine Learning Research, 15:1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

42
- 84906504815
- Semantic aware video transcription using random forest classifiers
- C. Sun and R. Nevatia. Semantic aware video transcription using random forest classifiers. In Proceedings of the European Conference on Computer Vision, pages 772-786, 2014.
- (2014) Proceedings of the European Conference on Computer Vision , pp. 772-786
- Sun, C.¹ Nevatia, R.²

43
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, 2014.
- (2014) Advances in Neural Information Processing Systems
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

44
- 84893343292
- Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude
- T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4, 2012.
- (2012) COURSERA: Neural Networks for Machine Learning, 4
- Tieleman, T.¹ Hinton, G.²

45
- 84969504307
- C3D: Generic features for video analysis
- D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: generic features for video analysis. In Proceedings of the IEEE International Conference on Computer Vision, 2015.
- (2015) Proceedings of the IEEE International Conference on Computer Vision
- Tran, D.¹ Bourdev, L.D.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

46
- 84956980995
- Cider: Consensus-based image description evaluation
- R. Vedantam, C. L. Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4566-4575, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 4566-4575
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

47
- 84973882730
- Sequence to sequence-video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In Proceedings of the IEEE International Conference on Computer Vision, pages 4534-4542, 2015.
- (2015) Proceedings of the IEEE International Conference on Computer Vision , pp. 4534-4542
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.J.⁴ Darrell, T.⁵ Saenko, K.⁶

48
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In Proceedings of the North American Chapter of the Association for Computational Linguistics, pages 1494-1504, 2015.
- (2015) Proceedings of the North American Chapter of the Association for Computational Linguistics , pp. 1494-1504
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.J.⁵ Saenko, K.⁶

49
- 84980377939
- A neural conversational model
- O. Vinyals and Q. V. Le. A neural conversational model. In ICML Deep Learning Workshop, 2015.
- (2015) ICML Deep Learning Workshop
- Vinyals, O.¹ Le, Q.V.²

50
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156-3164, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

51
- 80052877143
- Action recognition by dense trajectories
- June
- H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Action Recognition by Dense Trajectories. In IEEE Conference on Computer Vision & Pattern Recognition, pages 3169-3176, June 2011.
- (2011) IEEE Conference on Computer Vision & Pattern Recognition , pp. 3169-3176
- Wang, H.¹ Kläser, A.² Schmid, C.³ Liu, C.-L.⁴

52
- 84959897734
- Semantically conditioned lstm-based natural language generation for spoken dialogue systems
- T. Wen, M. Gasic, N. Mrksic, P. Su, D. Vandyke, and S. J. Young. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In Conference on Empirical Methods in Natural Language Processing, 2015.
- (2015) Conference on Empirical Methods in Natural Language Processing
- Wen, T.¹ Gasic, M.² Mrksic, N.³ Su, P.⁴ Vandyke, D.⁵ Young, S.J.⁶

53
- 0025503558
- Backpropagation through time: What does it do and how to do it
- P. Werbos. Backpropagation through time: what does it do and how to do it. In Proceedings of IEEE, volume 78, pages 1550-1560, 1990.
- (1990) Proceedings of IEEE, Volume 78 , pp. 1550-1560
- Werbos, P.¹

54
- 84980404991
- A multi-scale multiple instance video description network
- abs/1505.05914
- H. Xu, S. Venugopalan, V. Ramanishka, M. Rohrbach, and K. Saenko. A multi-scale multiple instance video description network. CoRR, abs/1505.05914, 2015.
- (2015) CoRR
- Xu, H.¹ Venugopalan, S.² Ramanishka, V.³ Rohrbach, M.⁴ Saenko, K.⁵

55
- 84940762015
- Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
- R. Xu, C. Xiong, W. Chen, and J. J. Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In Proceedings of AAAI Conference on Artificial Intelligence, 2015.
- (2015) Proceedings of AAAI Conference on Artificial Intelligence
- Xu, R.¹ Xiong, C.² Chen, W.³ Corso, J.J.⁴

56
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In Proceedings of the IEEE International Conference on Computer Vision, pages 4507-4515, 2015.
- (2015) Proceedings of the IEEE International Conference on Computer Vision , pp. 4507-4515
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

57
- 84961226145
- Learning to describe video with weak supervision by exploiting negative sentential information
- Jan.
- H. Yu and J. M. Siskind. Learning to describe video with weak supervision by exploiting negative sentential information. In AAAI Conference on Artificial Intelligence, pages 3855-3863, Jan. 2015.
- (2015) AAAI Conference on Artificial Intelligence , pp. 3855-3863
- Yu, H.¹ Siskind, J.M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.