메뉴 건너뛰기




Volumn 11, Issue 3, 2014, Pages 304-324

Interrater reliability estimators commonly used in scoring language assessments: A Monte Carlo investigation of estimator accuracy

Author keywords

[No Author keywords available]

Indexed keywords


EID: 84906857063     PISSN: 15434303     EISSN: 15434311     Source Type: Journal    
DOI: 10.1080/15434303.2014.937486     Document Type: Article
Times cited : (9)

References (57)
  • 1
    • 0001029742 scopus 로고
    • The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions
    • Babakus, E., Ferguson, C. E., & Jöreskog, K. G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of Marketing Research, 24(2), 222-229.
    • (1987) Journal of Marketing Research , vol.24 , Issue.2 , pp. 222-229
    • Babakus, E.1    Ferguson, C.E.2    Jöreskog, K.G.3
  • 2
    • 84906859119 scopus 로고    scopus 로고
    • Use of Monte Carlo studies in structural equation modeling research
    • G. R. Hancock & R. O. Mueller (Eds.) Charlotte, NC: Information Age
    • Bandalos, D. L., & Leite, W. (2013). Use of Monte Carlo studies in structural equation modeling research. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 625-666). Charlotte, NC: Information Age.
    • (2013) Structural Equation Modeling: A Second Course (2nd Ed.) , pp. 625-666
    • Bandalos, D.L.1    Leite, W.2
  • 3
    • 77949999471 scopus 로고    scopus 로고
    • Variability in ESL essay rating processes: The role of the rating scale and rater experience
    • Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54-74.
    • (2010) Language Assessment Quarterly , vol.7 , Issue.1 , pp. 54-74
    • Barkaoui, K.1
  • 5
    • 84925929273 scopus 로고
    • Pearson's r and coarseley categorized data
    • Bollen, K. A., & Barb, K. H. (1981). Pearson's r and coarseley categorized data. American Sociological Review, 46(2), 232-239.
    • (1981) American Sociological Review , vol.46 , Issue.2 , pp. 232-239
    • Bollen, K.A.1    Barb, K.H.2
  • 7
    • 85055357122 scopus 로고    scopus 로고
    • Description and examination of the national matriculation English test
    • Cheng, L., & Qi, L. (2006). Description and examination of the national matriculation English test. Language Assessment Quarterly, 3(1), 53-70.
    • (2006) Language Assessment Quarterly , vol.3 , Issue.1 , pp. 53-70
    • Cheng, L.1    Qi, L.2
  • 8
    • 84973587732 scopus 로고
    • A coefficient of agreement for nominal scales
    • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46.
    • (1960) Educational and Psychological Measurement , vol.20 , Issue.1 , pp. 37-46
    • Cohen, J.1
  • 9
    • 84875545864 scopus 로고    scopus 로고
    • In the states. Retrieved from
    • Common Core State Standards Initiative. (2012). In the states. Retrieved from http://www.corestandards.org/in-the-states
    • (2012) Common Core State Standards Initiative
  • 10
    • 84906870078 scopus 로고    scopus 로고
    • A review and recommended approach for estimating conditional structural equation models
    • University of Otago, Dunedin
    • Coote, L. (1998). A review and recommended approach for estimating conditional structural equation models. Paper presented at the annual conference of the Australia and New Zealand Marketing Academy, University of Otago, Dunedin.
    • (1998) Annual Conference of the Australia and New Zealand Marketing Academy
    • Coote, L.1
  • 12
    • 34250906921 scopus 로고
    • The average Spearman rank criterion when ties are present
    • Cureton, E. E. (1958). The average Spearman rank criterion when ties are present. Psychometrika, 23(3), 271-272.
    • (1958) Psychometrika , vol.23 , Issue.3 , pp. 271-272
    • Cureton, E.E.1
  • 13
    • 67349144030 scopus 로고    scopus 로고
    • Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing
    • East, M. (2009). Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing. Assessing Writing, 14(2), 88-115.
    • (2009) Assessing Writing , vol.14 , Issue.2 , pp. 88-115
    • East, M.1
  • 14
    • 84877996177 scopus 로고    scopus 로고
    • Nonnormal and categorical data in structural equation modeling
    • G. R. Hancock & R. O. Mueller (Eds.) Charlotte, NC: Information Age
    • Finney, S. J., & DiStefano, C. (2013). Nonnormal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural equation modeling: A second course (2nd ed., pp. 439-492). Charlotte, NC: Information Age.
    • (2013) Structural Equation Modeling: A Second Course (2nd Ed.) , pp. 439-492
    • Finney, S.J.1    Distefano, C.2
  • 15
    • 10844245499 scopus 로고    scopus 로고
    • An empirical evaluation of alternative methods of estimation for confirmatory factor analysis
    • Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative methods of estimation for confirmatory factor analysis. Psychological Methods, 9(4), 466-491.
    • (2004) Psychological Methods , vol.9 , Issue.4 , pp. 466-491
    • Flora, D.B.1    Curran, P.J.2
  • 16
    • 84865541067 scopus 로고    scopus 로고
    • Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide
    • Gadermann, A. M., Guhn, M., & Zumbo, B. D. (2012). Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide. Practical Assessment, Research, & Evaluation, 17(3). Retrieved from http://pareonline.net/getvn.asp?v=17&n=3
    • (2012) Practical Assessment, Research, & Evaluation , vol.17 , Issue.3
    • Gadermann, A.M.1    Guhn, M.2    Zumbo, B.D.3
  • 17
    • 50649094586 scopus 로고    scopus 로고
    • Reliability
    • R. L. Brennan (Ed.) Westport, CT: American Council on Education and Praeger Publishers
    • Haertel, E. H. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 65-110). Westport, CT: American Council on Education and Praeger Publishers.
    • (2006) Educational Measurement (4th Ed.) , pp. 65-110
    • Haertel, E.H.1
  • 18
    • 84979418752 scopus 로고
    • Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts
    • Hamp-Lyons, L., & Henning, G. (1991). Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts. Language Learning, 41(3), 337-373.
    • (1991) Language Learning , vol.41 , Issue.3 , pp. 337-373
    • Hamp-Lyons, L.1    Henning, G.2
  • 19
    • 0003406531 scopus 로고
    • Chicago, IL: University of Chicago Press
    • Harman, H. H. (1967). Modern factor analysis. Chicago, IL: University of Chicago Press.
    • (1967) Modern Factor Analysis
    • Harman, H.H.1
  • 20
    • 34250756132 scopus 로고    scopus 로고
    • Answering the call for a standard reliability measure for coding data
    • Hayes, A. F., & Krippendorff, K. (2007). Answering the call for a standard reliability measure for coding data. Communication Methods and Measures, 1(1), 77-89.
    • (2007) Communication Methods and Measures , vol.1 , Issue.1 , pp. 77-89
    • Hayes, A.F.1    Krippendorff, K.2
  • 22
    • 84878804138 scopus 로고    scopus 로고
    • Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions
    • Isaacs, T., & Thomson, R. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135-159.
    • (2013) Language Assessment Quarterly , vol.10 , Issue.2 , pp. 135-159
    • Isaacs, T.1    Thomson, R.2
  • 24
    • 0041410033 scopus 로고    scopus 로고
    • Using portfolios in program evaluation: An investigation of interrater reliability
    • Johnson, R. L., McDaniel, F., &Willeke, M. (2000). Using portfolios in program evaluation: An investigation of interrater reliability. The American Journal of Evaluation, 21(1), 65-80.
    • (2000) The American Journal of Evaluation , vol.21 , Issue.1 , pp. 65-80
    • Johnson, R.L.1    McDaniel, F.2    Willeke, M.3
  • 25
    • 0034386497 scopus 로고    scopus 로고
    • The relation between score resolution methods and interrater reliability: An empirical study of an analytic scoring rubric
    • Johnson, R. L., Penny, J. A.,&Gordon, B. (2000). The relation between score resolution methods and interrater reliability: An empirical study of an analytic scoring rubric. Applied Measurement in Education, 13(2), 121-138.
    • (2000) Applied Measurement in Education , vol.13 , Issue.2 , pp. 121-138
    • Johnson, R.L.1    Penny, J.A.2    Gordon, B.3
  • 27
    • 21844508207 scopus 로고
    • On the estimation of polychoric correlations and their asymptotic covariance matrix
    • Jöreskog, K. G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika, 59(3), 381-389.
    • (1994) Psychometrika , vol.59 , Issue.3 , pp. 381-389
    • Jöreskog, K.G.1
  • 31
    • 3242671716 scopus 로고
    • Constructing measurement with a many-facet Rasch model
    • M. Wilson (Ed.) Newark, NJ: Ablex
    • Linacre, J. M. (1994). Constructing measurement with a many-facet Rasch model. In M. Wilson (Ed.), Objective measurement: Theory in practice: Vol. 2. (pp. 356-442). Newark, NJ: Ablex.
    • (1994) Objective Measurement: Theory in Practice , vol.2 , pp. 356-442
    • Linacre, J.M.1
  • 32
    • 79951824113 scopus 로고    scopus 로고
    • Judge ratings with forced agreement
    • Linacre, J. M. (2002). Judge ratings with forced agreement. Rasch Measurement Transactions, 16(1), 857-858.
    • (2002) Rasch Measurement Transactions , vol.16 , Issue.1 , pp. 857-858
    • Linacre, J.M.1
  • 34
    • 79959498074 scopus 로고    scopus 로고
    • Developing rating scales for CASE: Theoretical concerns and analyses
    • M. Milanovic, N. Saville, & A. Pollitt (Eds.) Clevedon, United Kingdom: Multilingual Matters
    • Milanovic, M., Saville, N., Pollitt, A., & Cook, A. (1996). Developing rating scales for CASE: Theoretical concerns and analyses. In M. Milanovic, N. Saville, & A. Pollitt (Eds.), Validation in language testing (pp. 15-38). Clevedon, United Kingdom: Multilingual Matters.
    • (1996) Validation in Language Testing , pp. 15-38
    • Milanovic, M.1    Saville, N.2    Pollitt, A.3    Cook, A.4
  • 35
    • 84893453353 scopus 로고    scopus 로고
    • Scoring rubric development: Validity and reliability
    • Retrieved from
    • Moskal, B. M., Leydens, J. A. (2000). Scoring rubric development: Validity and reliability. Practical Assessment, Research, and Evaluation, 7(10). Retrieved from http://PAREonline.net/getvn.asp?v=7&n=10
    • (2000) Practical Assessment, Research, and Evaluation , vol.7 , Issue.10
    • Moskal, B.M.1    Leydens, J.A.2
  • 36
    • 84870226798 scopus 로고    scopus 로고
    • National Center for Education Statistics Washington, DC: Institute of Education Sciences, U.S. Department of Education
    • National Center for Education Statistics. (2012). The Nation's Report Card: Writing 2011 (NCES 2012 470).Washington, DC: Institute of Education Sciences, U.S. Department of Education.
    • (2012) The Nation's Report Card: Writing 2011 (NCES 2012 470)
  • 37
    • 34548026043 scopus 로고    scopus 로고
    • Application of generalizability theory in the investigation of the quality of journal writing in mathematics
    • Nie, Y., Yeo, S., & Lau, S. (2007). Application of generalizability theory in the investigation of the quality of journal writing in mathematics. Studies in Educational Evaluation, 33(3?4), 371-383.
    • (2007) Studies in Educational Evaluation , vol.33 , Issue.3-4 , pp. 371-383
    • Nie, Y.1    Yeo, S.2    Lau, S.3
  • 39
    • 84883068528 scopus 로고    scopus 로고
    • Stakeholder input and test design: A case study on changing the interlocutor familiarity facet of the group oral discussion test
    • Ockey, G., Koyama, D., & Setoguchi, E. (2013). Stakeholder input and test design: A case study on changing the interlocutor familiarity facet of the group oral discussion test. Language Assessment Quarterly, 10(3), 292-308.
    • (2013) Language Assessment Quarterly , vol.10 , Issue.3 , pp. 292-308
    • Ockey, G.1    Koyama, D.2    Setoguchi, E.3
  • 40
    • 0000541399 scopus 로고
    • Maximum likelihood estimation of the polychoric correlation coefficient
    • Olsson, U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44(4), 443-460.
    • (1979) Psychometrika , vol.44 , Issue.4 , pp. 443-460
    • Olsson, U.1
  • 41
    • 0141603477 scopus 로고    scopus 로고
    • The effect of score augmentation on the interrater reliability: An empirical study of a holistic rubric
    • Penny, J. A., Johnson, R. L., & Gordon, B. (2000). The effect of score augmentation on the interrater reliability: An empirical study of a holistic rubric. Assessing Writing, 7(2), 142-164.
    • (2000) Assessing Writing , vol.7 , Issue.2 , pp. 142-164
    • Penny, J.A.1    Johnson, R.L.2    Gordon, B.3
  • 43
    • 17444365258 scopus 로고    scopus 로고
    • Legal corner: GI forum v TEA
    • Phillips, S. (2000). Legal corner: GI forum v TEA. NCME Newsletter, 8(2).
    • (2000) NCME Newsletter , vol.8 , pp. 2
    • Phillips, S.1
  • 44
    • 28044456588 scopus 로고    scopus 로고
    • SAS Institute Cary, NC: SAS Institute, Inc
    • SAS Institute.(2010). SAS 9.2 help and documentation. Cary, NC: SAS Institute, Inc.
    • (2010) SAS 9.2 Help and Documentation
  • 47
    • 48249153186 scopus 로고
    • Intraclass correlation: Uses in assessing rater reliability
    • Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlation: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428.
    • (1979) Psychological Bulletin , vol.86 , Issue.2 , pp. 420-428
    • Shrout, P.E.1    Fleiss, J.L.2
  • 48
    • 84906870072 scopus 로고    scopus 로고
    • Smarter Balanced Assessment Consortium Retrieved from
    • Smarter Balanced Assessment Consortium. (2012). Sample items and performance tasks. Retrieved from http://www.smarterbalanced.org/sample-items- and-performance-tasks/
    • (2012) Sample Items and Performance Tasks
  • 49
    • 0002105651 scopus 로고
    • Reliability
    • R. L. Thorndike (Ed.) Washington, DC: American Council on Education
    • Stanley, J. C. (1971). Reliability. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 356-442).Washington, DC: American Council on Education.
    • (1971) Educational Measurement (2nd Ed.) , pp. 356-442
    • Stanley, J.C.1
  • 50
    • 84874593076 scopus 로고    scopus 로고
    • A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability
    • Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research, and Evaluation, 9(4). Retrieved from http://PAREonline. net/getvn.asp?v=9&n=4
    • (2004) Practical Assessment, Research, and Evaluation , vol.9 , Issue.4
    • Stemler, S.E.1
  • 51
    • 84906890507 scopus 로고    scopus 로고
    • Interrater reliability
    • N. Salkind (Ed.) Thousand Oaks, CA: Sage
    • Stemler, S. E. (2007). Interrater reliability. In N. Salkind (Ed.), Encyclopedia of measurement and statistics (pp. 484-486). Thousand Oaks, CA: Sage.
    • (2007) Encyclopedia of Measurement and Statistics , pp. 484-486
    • Stemler, S.E.1
  • 52
    • 84879516084 scopus 로고    scopus 로고
    • Assessing cohesion in children's writing: Development of a checklist
    • Struthers, L., Lapadat, J., & MacMillan, P. (2013). Assessing cohesion in children's writing: Development of a checklist. Assessing Writing, 18(3), 187-201.
    • (2013) Assessing Writing , vol.18 , Issue.3 , pp. 187-201
    • Struthers, L.1    Lapadat, J.2    MacMillan, P.3
  • 53
    • 0000995925 scopus 로고    scopus 로고
    • Assessing agreement: An examination of the interrater reliability of portfolio assessment in Rochester, New York
    • Supovitz, J., MacGowan, A., & Slattery, J. (1997). Assessing agreement: An examination of the interrater reliability of portfolio assessment in Rochester, New York. Educational Assessment, 4(3), 237-259.
    • (1997) Educational Assessment , vol.4 , Issue.3 , pp. 237-259
    • Supovitz, J.1    MacGowan, A.2    Slattery, J.3
  • 57
    • 77950765887 scopus 로고    scopus 로고
    • Measuring the speaking proficiency of advanced EFL learners in China: The CET-SET solution
    • Zhang, Y., & Elder, C. (2009). Measuring the speaking proficiency of advanced EFL learners in China: The CET-SET solution. Language Assessment Quarterly, 6(4), 298-314.
    • (2009) Language Assessment Quarterly , vol.6 , Issue.4 , pp. 298-314
    • Zhang, Y.1    Elder, C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.