메뉴 건너뛰기




Volumn 23, Issue 2-3, 2009, Pages 71-103

The NIST 2008 metrics for machine translation challenge-overview, methodology, metrics, and results

Author keywords

Automated MT metrics; Metric evaluation; MetricsMATR

Indexed keywords

CORRELATION STATISTICS; GENERAL CLASS; MACHINE TRANSLATIONS; METRIC EVALUATION; SYSTEM LEVELS; TEST DATA;

EID: 77954761816     PISSN: 09226567     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10590-009-9065-6     Document Type: Article
Times cited : (45)

References (31)
  • 1
    • 77954762781 scopus 로고    scopus 로고
    • Extending the BLEU MT evaluation method with frequency weightings. Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL'04)
    • Barcelona, Spain
    • Babych B, Hartley A (2004) Extending the BLEU MT evaluation method with frequency weightings. Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL'04). Association for Computational Linguistics, Barcelona, Spain
    • (2004) Association for Computational Linguistics
    • Babych, B.1    Hartley, A.2
  • 7
    • 70349246101 scopus 로고
    • Evaluating message understanding systems: An analysis of the third message understanding conference (MUC-3)
    • Chinchor N, Hirschman L, Lewis DD (1993) Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3). Computational Linguistics 19 (3)
    • (1993) Computational Linguistics , vol.19 , Issue.3
    • Chinchor, N.1    Hirschman, L.2    Lewis, D.D.3
  • 8
    • 84973587732 scopus 로고
    • A coefficient of agreement for nominal scales
    • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 37-46
    • (1960) Educ Psychol. Meas. , pp. 37-46
    • Cohen, J.1
  • 9
    • 84857783669 scopus 로고    scopus 로고
    • Normalization for automated metrics: English and Arabic speech translation
    • Association for Machine Translation in the Americas, Ottawa, ON, Canada
    • Condon S, Sanders GA, Parvaz D, Rubenstein A, Doran C, Aberdeen J, Oshika B (2009) Normalization for automated metrics: English and Arabic speech translation. Proceedings of MT Summit XII. Association for Machine Translation in the Americas, Ottawa, ON, Canada
    • (2009) Proceedings of MT Summit , vol.12
    • Condon, S.1    Sanders, G.A.2    Parvaz, D.3    Rubenstein, A.4    Doran, C.5    Aberdeen, J.6    Oshika, B.7
  • 10
    • 33748682610 scopus 로고    scopus 로고
    • Correlating automated and human assessments of machine translation quality
    • Association for Machine Translation in the Americas, New Orleans, LA
    • Coughlin D (2003) Correlating automated and human assessments of machine translation quality. Proceedings of MT Summit IX. Association for Machine Translation in the Americas, New Orleans, LA
    • (2003) Proceedings of MT Summit IX
    • Coughlin, D.1
  • 12
    • 15344349756 scopus 로고    scopus 로고
    • Wordnet: An electronic lexical database
    • Fellbaum C (1998) Wordnet: an electronic lexical database. Bradford Books
    • (1998) Bradford Books
    • Fellbaum, C.1
  • 13
    • 3343019470 scopus 로고
    • Measuring nominal scale agreement among many raters
    • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76
    • (1971) Psychol. Bull. , vol.76
    • Fleiss, J.L.1
  • 14
    • 33645066726 scopus 로고
    • Large sample standard errors of Kappa and weighted Kappa
    • Fleiss JL, Cohen J, Everitt BS (1969) Large sample standard errors of Kappa and weighted Kappa. Psychol Bull 72
    • (1969) Psychol. Bull. , vol.72
    • Fleiss, J.L.1    Cohen, J.2    Everitt, B.S.3
  • 17
    • 0002282074 scopus 로고
    • A new measure of rank correlation
    • Kendall MG (1938) A new measure of rank correlation. Biometrika 30
    • (1938) Biometrika 30
    • Kendall, M.G.1
  • 20
    • 0004030721 scopus 로고
    • Computer intensive methods for testing hypotheses
    • Wiley, New York
    • Noreen EW (1989) Computer intensive methods for testing hypotheses. An introduction. Wiley, New York
    • (1989) An Introduction
    • Noreen, E.W.1
  • 21
    • 84944098666 scopus 로고    scopus 로고
    • Minimum error rate training in statistical machine translation
    • Sapporo, Japan
    • Och FJ (2003) Minimum error rate training in statistical machine translation. Association for Computational Linguistics, Sapporo, Japan
    • (2003) Association for Computational Linguistics
    • Och, F.J.1
  • 22
    • 0141524308 scopus 로고    scopus 로고
    • BLEU: A method for automatic evaluation of machine translation
    • Yorktown Heights NY. IBM Research Division
    • Papineni K, Roukos S, Ward T, Zhu W-J (2001) BLEU: a method for automatic evaluation of machine translation. Technical Report, Yorktown Heights NY. IBM Research Division
    • (2001) Technical Report
    • Papineni, K.1    Roukos, S.2    Ward, T.3    Zhu, W.-J.4
  • 24
    • 0001454867 scopus 로고
    • On a criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen in random sampling
    • Pearson K (1900) On a criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen in random sampling. Philos Mag 50
    • (1900) Philos Mag. , vol.50
    • Pearson, K.1
  • 26
    • 77954757541 scopus 로고    scopus 로고
    • NIST metrics for machine translation challenge (MetricsMATR)
    • National Institute of Standards and Technol ogy, Accessed 28 Oct. 2009
    • Przybocki M, Peterson K, Bronsart S (2008) NIST metrics for machine translation challenge (MetricsMATR). NIST Multimodal Information Group. National Institute of Standards and Technol ogy. http://www.nist.gov/speech/tests/ metricsmatr/2008/doc/mm08-evalplan-v1.1pdf. Accessed 28 Oct. 2009)
    • (2008) NIST Multimodal Information Group
    • Przybocki, M.1    Peterson, K.2    Bronsart, S.3
  • 27
    • 77954760686 scopus 로고    scopus 로고
    • On some pitfalls in automatic evaluation and significance testing for MT. ACL-05 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization
    • Ann Arbor, MI
    • Riezler J, Maxwell JT (2005) On some pitfalls in automatic evaluation and significance testing for MT. ACL-05 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization. Association for Computational Linguistics, Ann Arbor, MI
    • (2005) Association for Computational Linguistics
    • Riezler, J.1    Maxwell, J.T.2
  • 28
    • 77954761840 scopus 로고    scopus 로고
    • Odds of successful transfer of low-level concepts: A key metric for bidirectional speech-to-speech machine translation in DARPA's TRANSTAC program
    • European Language Resources Association ELRA, Marrakech, Morocco
    • Sanders GA, Bronsart S, Condon S, Schlenoff C (2008) Odds of successful transfer of low-level concepts: a key metric for bidirectional speech-to-speech machine translation in DARPA's TRANSTAC program. Proceedings of the 6th international conference on language resources and evaluation (LREC'08). European Language Resources Association (ELRA), Marrakech, Morocco
    • (2008) Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'08)
    • Sanders, G.A.1    Bronsart, S.2    Condon, S.3    Schlenoff, C.4
  • 30
    • 0002965815 scopus 로고
    • The proof and measurement of association between two things
    • Spearman CE (1904) The proof and measurement of association between two things. Am J Psychol 15
    • (1904) Am. J. Psychol , vol.15
    • Spearman, C.E.1
  • 31
    • 0001884644 scopus 로고
    • Individual comparisons by ranking methods
    • Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1
    • (1945) Biometrics 1
    • Wilcoxon, F.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.