메뉴 건너뛰기




Volumn , Issue , 2016, Pages 2122-2132

How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation

Author keywords

[No Author keywords available]

Indexed keywords

SPEECH PROCESSING;

EID: 85072827450     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.18653/v1/d16-1230     Document Type: Conference Paper
Times cited : (1271)

References (45)
  • 4
    • 84859906372 scopus 로고    scopus 로고
    • Correlating human and automatic evaluation of a German surface realiser
    • Association for Computational Linguistics
    • A. Cahill. 2009. Correlating human and automatic evaluation of a german surface realiser. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 97-100. Association for Computational Linguistics.
    • (2009) Proceedings of the ACL-IJCNLP 2009 Conference Short Papers , pp. 97-100
    • Cahill, A.1
  • 5
    • 84893361786 scopus 로고    scopus 로고
    • Re-evaluation the role of bleu in machine translation research
    • C. Callison-Burch, M. Osborne, and P. Koehn. 2006. Re-evaluation the role of bleu in machine translation research. In EACL, volume 6, pages 249-256.
    • (2006) EACL , vol.6 , pp. 249-256
    • Callison-Burch, C.1    Osborne, M.2    Koehn, P.3
  • 8
    • 85122610414 scopus 로고    scopus 로고
    • A systematic comparison of smoothing techniques for sentence-level bleu
    • B. Chen and C. Cherry. 2014. A systematic comparison of smoothing techniques for sentence-level bleu. ACL 2014, page 362.
    • (2014) ACL , vol.2014 , pp. 362
    • Chen, B.1    Cherry, C.2
  • 9
    • 58149412516 scopus 로고
    • Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit
    • J. Cohen. 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213.
    • (1968) Psychological Bulletin , vol.70 , Issue.4 , pp. 213
    • Cohen, J.1
  • 11
    • 80053431219 scopus 로고    scopus 로고
    • The measurement of textual coherence with latent semantic analysis
    • P. W. Foltz, W. Kintsch, and T. K. Landauer. 1998. The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2-3):285-307.
    • (1998) Discourse Processes , vol.25 , Issue.2-3 , pp. 285-307
    • Foltz, P.W.1    Kintsch, W.2    Landauer, T.K.3
  • 15
    • 84944091060 scopus 로고    scopus 로고
    • Accurate evaluation of segment-level machine translation metrics
    • Cite-seer
    • Y. Graham, N. Mathur, and T. Baldwin. 2015. Accurate evaluation of segment-level machine translation metrics. In Proc. of NAACL-HLT, pages 1183-1191. Cite-seer.
    • (2015) Proc. Of NAACL-HLT , pp. 1183-1191
    • Graham, Y.1    Mathur, N.2    Baldwin, T.3
  • 22
    • 0000600219 scopus 로고    scopus 로고
    • A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge
    • T. K. Landauer and S. T. Dumais. 1997. A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2):211.
    • (1997) Psychological Review , vol.104 , Issue.2 , pp. 211
    • Landauer, T.K.1    Dumais, S.T.2
  • 27
    • 84988430909 scopus 로고    scopus 로고
    • The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems
    • R. Lowe, N. Pow, I. V. Serban, and J. Pineau. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In SIGDIAL.
    • (2015) SIGDIAL
    • Lowe, R.1    Pow, N.2    Serban, I.V.3    Pineau, J.4
  • 29
    • 84859927665 scopus 로고    scopus 로고
    • Vector-based models of semantic composition
    • J. Mitchell and M. Lapata. 2008. Vector-based models of semantic composition. In ACL, pages 236-244.
    • (2008) ACL , pp. 236-244
    • Mitchell, J.1    Lapata, M.2
  • 33
    • 71749094730 scopus 로고    scopus 로고
    • An investigation into the validity of some metrics for automatically evaluating natural language generation systems
    • E. Reiter and A. Belz. 2009. An investigation into the validity of some metrics for automatically evaluating natural language generation systems. Computational Linguistics, 35(4):529-558.
    • (2009) Computational Linguistics , vol.35 , Issue.4 , pp. 529-558
    • Reiter, E.1    Belz, A.2
  • 36
    • 85036049150 scopus 로고    scopus 로고
    • A comparison of greedy and optimal assessment of natural language student input using word-to-word similarity metrics
    • Stroudsburg, PA, USA. Association for Computational Linguistics
    • V. Rus and M. Lintean. 2012. A comparison of greedy and optimal assessment of natural language student input using word-to-word similarity metrics. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 157-162, Stroudsburg, PA, USA. Association for Computational Linguistics.
    • (2012) Proceedings of the Seventh Workshop on Building Educational Applications Using NLP , pp. 157-162
    • Rus, V.1    Lintean, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.