메뉴 건너뛰기




Volumn 2014, Issue 1, 2014, Pages 1-21

Monitoring of Scoring Using the e-rater® Automated Scoring System and Human Raters on a Writing Test

Author keywords

automated scoring; Constructed response items; human scoring; quality control

Indexed keywords


EID: 84964439576     PISSN: None     EISSN: 23308516     Source Type: Journal    
DOI: 10.1002/ets2.12005     Document Type: Article
Times cited : (10)

References (41)
  • 1
    • 85164501606 scopus 로고
    • ASTM manual on presentation of data and control chart analysis, (Pub. STP15D). Philadelphia, PA Author
    • American Society for Testing and Materials. (1976). ASTM manual on presentation of data and control chart analysis (Pub. No. STP15D). Philadelphia, PA: Author.
    • (1976)
  • 2
    • 85164519291 scopus 로고    scopus 로고
    • Construct validity of e-rater in scoring TOEFL essays, (Research Report RR–07–21). Princeton, NJ Educational Testing Service
    • Attali, Y. (2007). Construct validity of e-rater in scoring TOEFL essays (Research Report No. RR–07–21). Princeton, NJ: Educational Testing Service.
    • (2007)
    • Attali, Y.1
  • 3
    • 85164524339 scopus 로고    scopus 로고
    • e-rater evaluation for TOEFL iBT independent essays, Unpublished manuscript
    • Attali, Y. (2008). e-rater evaluation for TOEFL iBT independent essays. Unpublished manuscript.
    • (2008)
    • Attali, Y.1
  • 4
    • 77956291605 scopus 로고    scopus 로고
    • . Performance of a generic approach in automated scoring., Journal of Technology, LearningAssessment, 10
    • Attali, Y., Bridgeman, B., & Trapani, C. (2010). Performance of a generic approach in automated scoring. Journal of Technology, Learning, and Assessment, 10(3), 1–16.
    • (2010) , Issue.3 , pp. 1-16
    • Attali, Y.1    Bridgeman, B.2    Trapani, C.3
  • 6
    • 79961058822 scopus 로고    scopus 로고
    • A validity-based approach to quality control and assurance of automated scoring
    • Bejar, I. I. (2011). A validity-based approach to quality control and assurance of automated scoring. Assessment in Education: Principles, Policy and Practice, 18(3), 319–341.
    • (2011) Assessment in Education: Principles, Policy and Practice , vol.18 , Issue.3 , pp. 319-341
    • Bejar, I.I.1
  • 7
    • 21644443051 scopus 로고    scopus 로고
    • Automated essay scoring for nonnative English speakers
    • M. Broman Olsen, (Ed.),, Morristown, NJ, Association for Computational Linguistics
    • Burstein, J., & Chodorow, M. (1999). Automated essay scoring for nonnative English speakers. In M. Broman Olsen (Ed.), Computer mediated language assessment and evaluation in natural language processing (pp. 68–75). Morristown, NJ: Association for Computational Linguistics.
    • (1999) Computer mediated language assessment and evaluation in natural language processing , pp. 68-75
    • Burstein, J.1    Chodorow, M.2
  • 8
    • 85164523976 scopus 로고    scopus 로고
    • Beyond essay length Evaluating e–rater's performance on TOEFL essays, (Research Report RR–04–04). Princeton, NJ Educational Testing Service
    • Chodorow, M., & Burstein, J. (2004). Beyond essay length: Evaluating e–rater's performance on TOEFL essays (Research Report No. RR–04–04). Princeton, NJ: Educational Testing Service.
    • (2004)
    • Chodorow, M.1    Burstein, J.2
  • 10
    • 85164510156 scopus 로고    scopus 로고
    • April)., Principles for building and evaluating e–rater models, Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA
    • Davey, T. (2009, April). Principles for building and evaluating e–rater models. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
    • (2009)
    • Davey, T.1
  • 11
    • 85164522254 scopus 로고    scopus 로고
    • Studies of a latent class signal detection model for constructed-response scoring II Incomplete and hierarchical designs, (Research Report RR-10-08). Princeton, NJ Educational Testing Service
    • DeCarlo, L. T. (2010). Studies of a latent class signal detection model for constructed-response scoring II: Incomplete and hierarchical designs (Research Report No. RR-10-08). Princeton, NJ: Educational Testing Service.
    • (2010)
    • DeCarlo, L.T.1
  • 12
    • 85164465793 scopus 로고    scopus 로고
    • Using rater effects models in NAEP, Unpublished manuscript
    • Donoghue, J. R., McClellan, C. A., & Gladkova, L. (2006). Using rater effects models in NAEP. Unpublished manuscript.
    • (2006)
    • Donoghue, J.R.1    McClellan, C.A.2    Gladkova, L.3
  • 13
    • 84988122960 scopus 로고
    • Examining rater errors in the assessment of written composition with a many-faceted Rasch model
    • Engelhard, G., Jr. (1994). Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement, 31, 93–112.
    • (1994) Journal of Educational Measurement , vol.31 , pp. 93-112
    • Engelhard, G.1
  • 15
    • 85164528388 scopus 로고    scopus 로고
    • Detect cheating using statistical control methods for computer based CLEP examinations with item exposure risks, Unpublished manuscript
    • Gao, R. (2009). Detect cheating using statistical control methods for computer based CLEP examinations with item exposure risks. Unpublished manuscript.
    • (2009)
    • Gao, R.1
  • 16
    • 85164506651 scopus 로고    scopus 로고
    • Use of e-rater in scoring of the TOEFL iBT writing test, (Research Report RR–11–25). Princeton, NJ Educational Testing Service
    • Haberman, S. (2011). Use of e-rater in scoring of the TOEFL iBT writing test (Research Report No. RR–11–25). Princeton, NJ: Educational Testing Service.
    • (2011)
    • Haberman, S.1
  • 17
    • 85164463323 scopus 로고    scopus 로고
    • Measure of agreement, Unpublished manuscript
    • Haberman, S. (2012). Measure of agreement. Unpublished manuscript.
    • (2012)
    • Haberman, S.1
  • 18
    • 85164466277 scopus 로고    scopus 로고
    • Sample-size requirements for automated essay scoring, (Research Report RR–08–32). Princeton, NJ Educational Testing Service
    • Haberman, S., & Sinharay, S. (2008). Sample-size requirements for automated essay scoring (Research Report No. RR–08–32). Princeton, NJ: Educational Testing Service.
    • (2008)
    • Haberman, S.1    Sinharay, S.2
  • 19
    • 0347672323 scopus 로고
    • Analyzing ratings and training raters
    • Kingsbury, F. A. (1922). Analyzing ratings and training raters. Journal of Personnel Research, 1, 377–383.
    • (1922) Journal of Personnel Research , vol.1 , pp. 377-383
    • Kingsbury, F.A.1
  • 20
    • 76349113647 scopus 로고    scopus 로고
    • Performance assessment
    • R. L. Brennan, (Ed.),, 4th ed., Westport, CT, Praeger
    • Lane, S., & Stone, C. A. (2006). Performance assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 387–431). Westport, CT: Praeger.
    • (2006) Educational measurement , pp. 387-431
    • Lane, S.1    Stone, C.A.2
  • 21
    • 85164505130 scopus 로고    scopus 로고
    • Using data mining and quality control techniques to monitor scaled scores, Manuscript submitted for publication
    • Lee, Y.-H., & von Davier, A. A. (2012). Using data mining and quality control techniques to monitor scaled scores. Manuscript submitted for publication.
    • (2012)
    • Lee, Y.-H.1    von Davier, A.A.2
  • 23
    • 85164492359 scopus 로고    scopus 로고
    • April)., Some small sample statistical quality control procedures for constructed response scoring in language testing, Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO
    • Luecht, R. M. (2010, April). Some small sample statistical quality control procedures for constructed response scoring in language testing. Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
    • (2010)
    • Luecht, R.M.1
  • 25
    • 71549124344 scopus 로고    scopus 로고
    • Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use
    • Myford, C. M., & Wolfe, E. W. (2009). Monitoring rater performance over time: A framework for detecting differential accuracy and differential scale category use. Journal of Educational Measurement, 46, 371–389.
    • (2009) Journal of Educational Measurement , vol.46 , pp. 371-389
    • Myford, C.M.1    Wolfe, E.W.2
  • 26
    • 85164483057 scopus 로고    scopus 로고
    • NIST/SEMATECH e–handbook of statistical methods, Retrieved from
    • National Institute of Standards and Technology. (n.d.). NIST/SEMATECH e–handbook of statistical methods. Retrieved from http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc32.htm
  • 27
    • 77953761690 scopus 로고    scopus 로고
    • Statistical process control charts for measuring and monitoring temporal consistency of ratings
    • Omar, M. H. (2010). Statistical process control charts for measuring and monitoring temporal consistency of ratings. Journal of Educational Measurement, 47(1), 18–35.
    • (2010) Journal of Educational Measurement , vol.47 , Issue.1 , pp. 18-35
    • Omar, M.H.1
  • 28
    • 0036960386 scopus 로고    scopus 로고
    • The hierarchical rater model for rated test items and its application to large-scale educational assessment data
    • Patz, R. J., Junker, B. W., Johnson, M. J., & Mariano, L. T. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27, 341–384.
    • (2002) Journal of Educational and Behavioral Statistics , vol.27 , pp. 341-384
    • Patz, R.J.1    Junker, B.W.2    Johnson, M.J.3    Mariano, L.T.4
  • 29
    • 85164517795 scopus 로고    scopus 로고
    • Evaluation of e-rater for the GRE issue and argument prompts, (Research Report RR–12–06). Princeton, NJ Educational Testing Service
    • Ramineni, C., Trapani, C., Williamson, D. M., Davey, T., & Bridgeman, B. (2012). Evaluation of e-rater for the GRE issue and argument prompts (Research Report No. RR–12–06). Princeton, NJ: Educational Testing Service.
    • (2012)
    • Ramineni, C.1    Trapani, C.2    Williamson, D.M.3    Davey, T.4    Bridgeman, B.5
  • 30
    • 85164516285 scopus 로고    scopus 로고
    • April)., Understanding mean score differences between e–rater and humans for demographic–based groups in GRE, Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA
    • Ramineni, C., Williamson, D., & Weng, V. (2011, April). Understanding mean score differences between e–rater and humans for demographic–based groups in GRE. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
    • (2011)
    • Ramineni, C.1    Williamson, D.2    Weng, V.3
  • 32
    • 85164468855 scopus 로고    scopus 로고
    • Proposed rater statistics for TOEFL iBT constructed response, Unpublished manuscript
    • Walker, M. (2005). Proposed rater statistics for TOEFL iBT constructed response. Unpublished manuscript.
    • (2005)
    • Walker, M.1
  • 33
    • 85164454399 scopus 로고    scopus 로고
    • TOEFL Writing Prompt 2 (independent prompt) health check report Human rater and e-rater, Unpublished manuscript
    • Wang, Z. (2010). TOEFL Writing Prompt 2 (independent prompt) health check report: Human rater and e-rater. Unpublished manuscript.
    • (2010)
    • Wang, Z.1
  • 34
    • 85164460281 scopus 로고    scopus 로고
    • Proposed procedures to monitor the performance of the human & electronic ratings for all programs, Unpublished manuscript
    • Wang, Z., & von Davier, A. A. (2010). Proposed procedures to monitor the performance of the human & electronic ratings for all programs. Unpublished manuscript.
    • (2010)
    • Wang, Z.1    von Davier, A.A.2
  • 35
    • 85164491659 scopus 로고    scopus 로고
    • April)., The effects of scoring designs and rater severity on students' ability estimation for constructed response items, Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA
    • Wang, Z., & Yao, L. (2011, April). The effects of scoring designs and rater severity on students' ability estimation for constructed response items. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
    • (2011)
    • Wang, Z.1    Yao, L.2
  • 36
    • 85164459242 scopus 로고    scopus 로고
    • April)., Investigation of the effects of scoring designs and rater severity on students' ability estimation using different rater models, Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, BC
    • Wang, Z., & Yao, L. (2012, April). Investigation of the effects of scoring designs and rater severity on students' ability estimation using different rater models. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, BC.
    • (2012)
    • Wang, Z.1    Yao, L.2
  • 37
    • 85164492358 scopus 로고    scopus 로고
    • March)., Effects of different training and scoring approaches on human constructed response scoring, Paper presented at the annual meeting of the National Council on Measurement in Education, New York, NY
    • Way, W. D., Vickers, D., & Nichols, P. (2008, March). Effects of different training and scoring approaches on human constructed response scoring. Paper presented at the annual meeting of the National Council on Measurement in Education, New York, NY.
    • (2008)
    • Way, W.D.1    Vickers, D.2    Nichols, P.3
  • 38
    • 85164463878 scopus 로고
    • 2nd ed., New York, NY, Author
    • Western Electronic Company. (1958). Statistical quality control (2nd ed.). New York, NY: Author.
    • (1958) Statistical quality control
  • 41
    • 85164522364 scopus 로고    scopus 로고
    • April)., Detecting order effects with a multi–faceted Rasch scale model, Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL
    • Wolfe, E. W., & Myford, C. M. (1997, April). Detecting order effects with a multi–faceted Rasch scale model. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
    • (1997)
    • Wolfe, E.W.1    Myford, C.M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.