메뉴 건너뛰기




Volumn 2017-January, Issue , 2017, Pages 6298-6306

SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; CONVOLUTION; ENCODING (SYMBOLS); NEURAL NETWORKS; SIGNAL ENCODING;

EID: 85029348551     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/CVPR.2017.667     Document Type: Conference Paper
Times cited : (1744)

References (43)
  • 2
    • 84959933549 scopus 로고    scopus 로고
    • Neural machine translation by jointly learning to align and translate
    • In 2
    • D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2014. 2
    • (2014) ICLR
    • Bahdanau, D.1    Cho, K.2    Bengio, Y.3
  • 3
    • 85116156579 scopus 로고    scopus 로고
    • Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
    • In 5
    • S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL, 2005. 5
    • (2005) ACL
    • Banerjee, S.1    Lavie, A.2
  • 4
    • 85044532646 scopus 로고    scopus 로고
    • Abc-cnn: An attention based convolutional neural network for visual question answering
    • In 1
    • K. Chen, J. Wang, L.-C. Chen, H. Gao, W. Xu, and R. Nevatia. Abc-cnn: An attention based convolutional neural network for visual question answering. In CVPR, 2016. 1
    • (2016) CVPR
    • Chen, K.1    Wang, J.2    Chen, L.-C.3    Gao, H.4    Xu, W.5    Nevatia, R.6
  • 5
    • 0036517313 scopus 로고    scopus 로고
    • Control of goal-directed and stimulus-driven attention in the brain
    • 1
    • M. Corbetta and G. L. Shulman. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 2002. 1
    • (2002) Nature Reviews Neuroscience
    • Corbetta, M.1    Shulman, G.L.2
  • 7
    • 84965148420 scopus 로고    scopus 로고
    • Are you talking to a machine? Dataset and methods for multilingual image question
    • In 2
    • H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? dataset and methods for multilingual image question. In NIPS, 2015. 2
    • (2015) NIPS
    • Gao, H.1    Mao, J.2    Zhou, J.3    Huang, Z.4    Wang, L.5    Xu, W.6
  • 10
    • 84883394520 scopus 로고    scopus 로고
    • Framing image description as a ranking task: Data, models and evaluation metrics
    • 4
    • M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 2013. 4
    • (2013) JAIR
    • Hodosh, M.1    Young, P.2    Hockenmaier, J.3
  • 11
    • 84973917813 scopus 로고    scopus 로고
    • Guiding the long-short term memory model for image caption generation
    • In 2, 5, 6
    • X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding the long-short term memory model for image caption generation. In ICCV, 2015. 2, 5, 6
    • (2015) ICCV
    • Jia, X.1    Gavves, E.2    Fernando, B.3    Tuytelaars, T.4
  • 12
    • 84962850780 scopus 로고    scopus 로고
    • Deep compositional cross-modal learning to rank via local-global alignment
    • In 2
    • X. Jiang, F. Wu, X. Li, Z. Zhao, W. Lu, S. Tang, and Y. Zhuang. Deep compositional cross-modal learning to rank via local-global alignment. In ACM MM, pages 69-78, 2015. 2
    • (2015) ACM MM , pp. 69-78
    • Jiang, X.1    Wu, F.2    Li, X.3    Zhao, Z.4    Lu, W.5    Tang, S.6    Zhuang, Y.7
  • 13
    • 84946734827 scopus 로고    scopus 로고
    • Deep visual-semantic alignments for generating image descriptions
    • In 2, 5, 6
    • A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 2, 5, 6
    • (2015) CVPR
    • Karpathy, A.1    Fei-Fei, L.2
  • 15
    • 79960403098 scopus 로고    scopus 로고
    • Rouge: A package for automatic evaluation of summaries
    • 5
    • C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In ACL, 2004. 5
    • (2004) ACL
    • Lin, C.-Y.1
  • 17
    • 84973896625 scopus 로고    scopus 로고
    • Ask your neurons: A neural-based approach to answering questions about images
    • In 2
    • M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015. 2
    • (2015) ICCV
    • Malinowski, M.1    Rohrbach, M.2    Fritz, M.3
  • 18
    • 85083950512 scopus 로고    scopus 로고
    • Deep captioning with multimodal recurrent neural networks (m-rnn)
    • In 6
    • J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015. 6
    • (2015) ICLR
    • Mao, J.1    Xu, W.2    Yang, Y.3    Wang, J.4    Huang, Z.5    Yuille, A.6
  • 19
    • 84937959846 scopus 로고    scopus 로고
    • Recurrent models of visual attention
    • 1
    • V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In NIPS, 2014. 1
    • (2014) NIPS
    • Mnih, V.1    Heess, N.2    Graves, A.3
  • 20
    • 85133336275 scopus 로고    scopus 로고
    • Bleu: A method for automatic evaluation of machine translation
    • In 5
    • K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002. 5
    • (2002) ACL
    • Papineni, K.1    Roukos, S.2    Ward, T.3    Zhu, W.-J.4
  • 21
    • 84965170394 scopus 로고    scopus 로고
    • Exploring models and data for image question answering
    • In 2
    • M. Ren, R. Kiros, and R. Zemel. Exploring models and data for image question answering. In NIPS, 2015. 2
    • (2015) NIPS
    • Ren, M.1    Kiros, R.2    Zemel, R.3
  • 23
    • 84951869843 scopus 로고    scopus 로고
    • Supervised discrete hashing
    • In 2
    • F. Shen, C. Shen, W. Liu, and H. Tao Shen. Supervised discrete hashing. In CVPR, pages 37-45, 2015. 2
    • (2015) CVPR , pp. 37-45
    • Shen, F.1    Shen, C.2    Liu, W.3    Tao Shen, H.4
  • 26
    • 84937961845 scopus 로고    scopus 로고
    • Deep networks with internal selective attention through feedback connections
    • In 1
    • M. F. Stollenga, J. Masci, F. Gomez, and J. Schmidhuber. Deep networks with internal selective attention through feedback connections. In NIPS, 2014. 1
    • (2014) NIPS
    • Stollenga, M.F.1    Masci, J.2    Gomez, F.3    Schmidhuber, J.4
  • 27
    • 84986296808 scopus 로고    scopus 로고
    • Rethinking the inception architecture for computer vision
    • In 6
    • C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, pages 2818-2826, 2016. 6
    • (2016) CVPR , pp. 2818-2826
    • Szegedy, C.1    Vanhoucke, V.2    Ioffe, S.3    Shlens, J.4    Wojna, Z.5
  • 28
    • 84956980995 scopus 로고    scopus 로고
    • Cider: Consensus-based image description evaluation
    • In 5
    • R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015. 5
    • (2015) CVPR
    • Vedantam, R.1    Lawrence Zitnick, C.2    Parikh, D.3
  • 31
    • 84946747440 scopus 로고    scopus 로고
    • Show and tell: A neural image caption generator
    • In 2, 5, 6
    • O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 2, 5, 6
    • (2015) CVPR
    • Vinyals, O.1    Toshev, A.2    Bengio, S.3    Erhan, D.4
  • 33
    • 85035008367 scopus 로고    scopus 로고
    • Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
    • H. Xu and K. Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, 2016. 1, 2
    • (2016) ECCV
    • Xu, H.1    Saenko, K.2
  • 35
    • 84986334021 scopus 로고    scopus 로고
    • Stacked attention networks for image question answering
    • Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016. 1, 2
    • (2016) CVPR
    • Yang, Z.1    He, X.2    Gao, J.3    Deng, L.4    Smola, A.5
  • 37
    • 84986317307 scopus 로고    scopus 로고
    • Image captioning with semantic attention
    • Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016. 2, 6
    • (2016) CVPR
    • You, Q.1    Jin, H.2    Wang, Z.3    Fang, C.4    Luo, J.5
  • 38
    • 84906494296 scopus 로고    scopus 로고
    • From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
    • 5
    • P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014. 5
    • (2014) TACL
    • Young, P.1    Lai, A.2    Hodosh, M.3    Hockenmaier, J.4
  • 40
    • 84921476116 scopus 로고    scopus 로고
    • Visualizing and understanding convolutional networks
    • In 1, 2, 7
    • M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014. 1, 2, 7
    • (2014) ECCV
    • Zeiler, M.D.1    Fergus, R.2
  • 41
    • 85029388674 scopus 로고    scopus 로고
    • Visual translation embedding network for visual relation detection
    • In 2
    • H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua. Visual translation embedding network for visual relation detection. In CVPR, 2017. 2
    • (2017) CVPR
    • Zhang, H.1    Kyaw, Z.2    Chang, S.-F.3    Chua, T.-S.4
  • 42
    • 84994666699 scopus 로고    scopus 로고
    • Partial multimodal sparse coding via adaptive similarity structure regularization
    • In 2
    • Z. Zhao, H. Lu, C. Deng, X. He, and Y. Zhuang. Partial multimodal sparse coding via adaptive similarity structure regularization. In ACM MM, pages 152-156, 2016. 2
    • (2016) ACM MM , pp. 152-156
    • Zhao, Z.1    Lu, H.2    Deng, C.3    He, X.4    Zhuang, Y.5
  • 43
    • 84986275767 scopus 로고    scopus 로고
    • Visual7w: Grounded question answering in images
    • In 2
    • Y. Zhu, O. Groth, M. Bernstein, and L. Fei-Fei. Visual7w: Grounded question answering in images. In CVPR, 2016. 2
    • (2016) CVPR
    • Zhu, Y.1    Groth, O.2    Bernstein, M.3    Fei-Fei, L.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.