메뉴 건너뛰기




Volumn 2, Issue 2, 2006, Pages

Evaluating multiple-choice exams in large introductory physics courses

Author keywords

[No Author keywords available]

Indexed keywords


EID: 33746640536     PISSN: 15549178     EISSN: 15549178     Source Type: Journal    
DOI: 10.1103/PhysRevSTPER.2.020102     Document Type: Article
Times cited : (33)

References (30)
  • 2
    • 84988073215 scopus 로고
    • On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests
    • R. Lukhele, D. Thissen, and H. Wainer, On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests, J. Educ. Meas. 31, 234 (1994).
    • (1994) J. Educ. Meas. , vol.31 , pp. 234
    • Lukhele, R.1    Thissen, D.2    Wainer, H.3
  • 4
    • 33845514172 scopus 로고    scopus 로고
    • In-class examinations in college-level science: New theory, new practice
    • S. Tobias and J. B. Raphael, In-class examinations in college-level science: New theory, new practice, J. Sci. Educ. Technol. 5, 311 (1996).
    • (1996) J. Sci. Educ. Technol. , vol.5 , pp. 311
    • Tobias, S.1    Raphael, J.B.2
  • 6
    • 33746613952 scopus 로고    scopus 로고
    • note
    • Midterm exams are written to be 60 min exams, but students are allotted 90 min to complete them. Students are allotted 3 h to take the final exam. For most students, time is not an issue.
  • 7
    • 33746611849 scopus 로고    scopus 로고
    • note
    • For clarification, to get each student's even and odd scores, each of the four exams were first ordered by item difficulty. Then a student's even score is the sum of their scores from the even questions from exams 1 and 3 and the odd questions from exams 2 and 4. Likewise, a student's odd score is the sum of their scores from the odd questions from exams 1 and 3 and the even questions from exams 2 and 4.
  • 8
    • 33746613127 scopus 로고    scopus 로고
    • note
    • This analysis considers only our A to C students because it is these students whose exam performance shows a strong linear correspondence to their assigned letter grade. That is, these students tend to receive 90% or more credit on the effort components of the course (e.g., homework, quizzes, and laboratories). Thus, their effort grade is not a distinguishing factor to the grade they receive in the course. This is not true, in general, for D and F students. Not only do these students do poorly on the exams, they also tend to do poorly on the effort components of the class. Therefore, the strong linear relationship between exam performance and assigned letter grade that is present for A to C students is not present for D to F students.
  • 9
    • 33746594445 scopus 로고    scopus 로고
    • note
    • It should also be noted that over this same time span, more than 50 physics professors contributed in creating the exams used in the introductory courses.
  • 10
    • 33746598105 scopus 로고    scopus 로고
    • note
    • In a second splitting method, the "even" test is literally the collection of the even-numbered questions from the first and third midterms and the odd-numbered questions from the second midterm and final. The reverse construction is made for the "odd" test. The uncertainty found using this method was 3.5%.
  • 11
    • 33746610880 scopus 로고    scopus 로고
    • note
    • A third splitting method is simply an alteration of the second splitting method. Here, the "even" test is questions 1, 4, 5, 8, 9,..., from the first and third midterms and questions 2, 3, 6, 7,..., from the second midterm and final. The reverse construction is made for the odd test. The uncertainty from this splitting was 3.6%.
  • 12
    • 33746588118 scopus 로고    scopus 로고
    • note
    • An offset to zero for each semester could be made so that all semesters had the same average percent difference in even and odd tests. This correction would account for the fact that students in different course semesters do not have the same even and odd tests. Adding this offset has the inherit effect of diminishing the standard deviations in the distributions to 3.2% for both the second and third methods of splitting the questions. This offset had little effect on the first splitting method.
  • 14
    • 33746633263 scopus 로고    scopus 로고
    • note
    • A letter grade difference of 1.0 is equivalent to a letter grade difference of A to B or B to C. A letter grade difference of 1/3 is equivalent to the difference between an A and an A- or an A- to a B+.
  • 15
    • 33746645086 scopus 로고    scopus 로고
    • in edited by David Thissen and Howard Wainer (Lawrence Erlbaum Associates, Hillsdale, NJ) Chap. 2
    • H. Wainer and D. Thissen, in True Score Theory: The Traditional Method, edited by David Thissen and Howard Wainer (Lawrence Erlbaum Associates, Hillsdale, NJ. 2001), Chap. 2, pp. 23-72.
    • (2001) True Score Theory: The Traditional Method , pp. 23-72
    • Wainer, H.1    Thissen, D.2
  • 17
    • 33746605911 scopus 로고    scopus 로고
    • note
    • Common convention is to desire reliability correlation coefficients greater than 0.80 to ensure that a student's exam score uncertainty is less than half of the standard deviation in the class' exam score distribution.
  • 18
    • 33845945922 scopus 로고
    • Coefficient alpha and the internal structure of tests
    • L. J. Cronbach, Coefficient alpha and the internal structure of tests, Psychometrika 16, 297 (1951).
    • (1951) Psychometrika , vol.16 , pp. 297
    • Cronbach, L.J.1
  • 19
    • 33746645475 scopus 로고    scopus 로고
    • note
    • Because some of the exam items are grouped together under the same physical situation, splitting these items into separate split-half exams generally increases the correlation coefficient between the split-half exams and thus artificially increases the coefficient alpha. It may be more appropriate to treat those questions that are grouped together under the same prompt as testlets, and then to calculate alpha using testlet scores. To see what effect this might have on our alpha values, we examined four semester sets of exams: two from calculus-based mechanics and two from algebra-based mechanics. In each of the four semesters, the testlet alpha was indeed less than the item alpha, but never by more than 2% of the item alpha. This difference between the item and testlet alphas is less than the variation between semester item alphas.
  • 21
    • 33746633655 scopus 로고    scopus 로고
    • note
    • One justification for this selection process is that if only A and F students participated in the study, correlations between multiple-choice and constructed-response scores would artificially be high. We wanted to make sure there was an even distribution of students in the letter grade range from A to C. This is the range of most interest to us since it is this range students' course grades are predominately dependent upon exam performance. Students in the D to F range do poorly on all components of the course, not just the exams. To ensure that there were equal number of students in each grade category, we chose to select only those students who had scored consistently on their three midterm exams. If a student receives an "A" on one midterm but then receives a "C" on another, one does not know whether this student is really an A, B, or C student.
  • 22
    • 33746626733 scopus 로고    scopus 로고
    • note
    • This weighting system was instituted to allow for partial credit. The five-option items are intended to be more difficult than two- and three-option items. Students can receive partial credit on a five-option item in one of the following ways: six points if only one option is chosen and is correct, three points if only two options are chosen and one of the chosen options is correct, two points if only three options are chosen and one of the chosen options is correct, and zero points for all other markings.
  • 23
    • 33746616781 scopus 로고    scopus 로고
    • note
    • To address any concerns that these raw correlations are large because of the selection of students who participated in the study, there is a correction that can be made to estimate what the raw correlations would be if the students were a pure random sampling of the entire class. This correction of heterogeneity had little effect on our raw correlations: for group 1, r=0.88 went to 0.90, and for group 2, r=0.92 went to 0.89. We were able to test the validity of this correction from our reliability data and found that it predicted on average at most a value that was only 0.62% ± 0.07% over the actual value.
  • 24
    • 0008951798 scopus 로고
    • edited by (American Council on Education, Washington, D.C.)
    • Educational Measurement, edited by R. L. Thorndike (American Council on Education, Washington, D.C., 1971).
    • (1971) Educational Measurement
    • Thorndike, R.L.1
  • 25
    • 33746656116 scopus 로고    scopus 로고
    • note
    • Using the heterogeneity correction, the raw correlation values between MC and CS went from 0.78 and 0.83 to 0.81 and 0.77 for groups 1 and 2, respectively.
  • 27
    • 0000190679 scopus 로고
    • Teaching problem solving through cooperative grouping. Part 1: Group versus individual problem solving
    • P. Heller and M. Hollabaugh, Teaching problem solving through cooperative grouping. Part 1: Group versus individual problem solving, Am. J. Phys. 60, 627 (1992).
    • (1992) Am. J. Phys. , vol.60 , pp. 627
    • Heller, P.1    Hollabaugh, M.2
  • 28
    • 0000190679 scopus 로고
    • Teaching problem solving through cooperative grouping. Part 2: Designing problems and structuring groups
    • P. Heller and M. Hollabaugh, Teaching problem solving through cooperative grouping. Part 2: Designing problems and structuring groups, Am. J. Phys. 60, 637 (1992).
    • (1992) Am. J. Phys. , vol.60 , pp. 637
    • Heller, P.1    Hollabaugh, M.2
  • 29
    • 33746621176 scopus 로고    scopus 로고
    • note
    • Full credit for a two-choice, three-choice, or five-choice question is two points, three points, or six points, respectively. See end-note in the subsection "The Study" of the Validity section for an explanation of the weighted grading system.
  • 30
    • 33746619513 scopus 로고    scopus 로고
    • For more examples of questions used in our exams, visit the Illinois Physics Education Research Group's website at and click on the "Resources" link. Researchers and teachers can gain free access to all of the midterm exams used in the introductory courses in recent years
    • For more examples of questions used in our exams, visit the Illinois Physics Education Research Group's website at http://www.physics.uiuc.edu/Research/PER/ and click on the "Resources" link. Researchers and teachers can gain free access to all of the midterm exams used in the introductory courses in recent years.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.