 © 2008 American Dental Education Association
Abstract
A standard correction for random guessing on multiplechoice examinations was implemented prospectively in an Oral and Maxillofacial Pathology course for secondyear dental students. The correction was a weighted scoring formula for points awarded for correct answers, incorrect answers, and unanswered questions such that the expected gain in the multiplechoice examination score due to random guessing was zero. An equally weighted combination of four examinations using equal numbers of shortanswer questions and multiplechoice questions was used for student evaluation. Scores on both types of examinations, after implementation of the correction for guessing on the multiplechoice component (academic year 2005–06), were compared with the previous year (academic year 2004–05) when correction for guessing was not used for student evaluation but was investigated retrospectively. Academically, the two classes were comparable as indicated by the grade distributions in a General Pathology course taken immediately prior to the Oral and Maxillofacial Pathology course. Agreement between scores on shortanswer examinations and multiplechoice examinations was improved in the 2005–06 class compared with the 2004–05 class. Importantly, the test score means were higher on both the shortanswer and multiplechoice examinations in the Oral and Maxillofacial Pathology course, and the standard deviations were significantly smaller in 2005–06 compared to 2004–05; these differences reflected an upward shift in the lower part of the grade distributions to higher grades in 2005–06. Furthermore, when students were classified by their grade in the General Pathology course, students receiving a C (numerical grade of 70–79 percent) in General Pathology had significantly improved performance in the Oral and Maxillofacial Pathology course in 2005–06, relative to 2004–05, on both shortanswer and multiplechoice examinations representing an aptitudetreatment interaction. We interpret this improved performance as a response to a higher expectation imposed on the 2005–06 students by the prospective implementation of correction for guessing.
 aptitudetreatment interaction
 validity
 formula scoring
 correction for guessing
 educational methodology
 educational measurement
 student performance
 evaluation
 multiplechoice questions
 shortanswer questions
 dental education
The expectation for students to improve their scores by guessing on multiplechoice format examinations is well known. We previously reported the results of retrospectively applying a standard correction for random (no knowledge) guessing^{1}^{–}^{3} to the scores on multiplechoice examinations in an Oral and Maxillofacial Pathology course for dental students.^{4} We found increased agreement of corrected scores from multiplechoice examinations with scores on shortanswer examinations, that is, increased validity. We take as selfevident that the shortanswer format examination greatly reduces the potential for guessing the correct answer. In this article, we report the results of a prospective implementation of the correction for guessing in which students were told in advance that a correction for guessing would be applied to their multiplechoice examination scores.
We wanted to accomplish three objectives by prospective implementation of a correction for guessing. The first would be improved validity for the multiplechoice examinations reflected by better agreement of multiplechoice scores with the shortanswer scores; this improved validity for the multiplechoice examinations would provide a better assessment of a student’s knowledge. The second would be that a random component of the scores (luck) would be reduced, resulting in increased reliability of the multiplechoice examinations. The third objective would be a change in students’ behavior toward selfrecognition and acknowledgment of what they do not know. This ability to recognize and the willingness to admit what they do not know are essential attributes for future health care professionals.
Students, as well as some faculty colleagues, expressed concern that using correction for guessing would result in lower grades and thereby jeopardize a student’s academic record; we shared that concern. However, the results of this study showed that students’ grades were not adversely affected, but to the contrary, student performance actually improved after implementation of correction for guessing.
Methods
This study was designed to evaluate the effects of prospectively applying a correction for guessing to test scores of multiplechoice examinations. Student scores from an Oral and Maxillofacial Pathology course were compared between two academic class years. The courses in the two years were the same except that, in the second class, students were informed that correction for guessing would be employed in calculating the multiplechoice examination scores. Student course scores were derived from both shortanswer and multiplechoice examinations. Scores in the prerequisite General Pathology course were used to show that the two classes were academically comparable. We statistically compared student performance in the Oral and Maxillofacial Pathology course between the two classes overall and also within grade classification in the General Pathology course to assess the effect of correction for guessing on students of different ability. Statistical analyses were also conducted to evaluate the effect of correction for guessing on the validity of multiplechoice examinations and on the reliability of examinations.
The correction for guessing that we investigated was a modification to the common grading method for multiplechoice examinations (numbercorrect or numberright scoring), in which 0 points are assigned for an incorrect answer and full credit is given for a correct answer.^{4}^{,}^{5} In the multiplechoice examinations we investigated, each multiplechoice question had five possible answers. The standard correction for guessing consisted of awarding −1/4 for an incorrect answer, 0 for a question not answered, and +1 for a correct answer. The probability of guessing, assuming a random selection, the single correct answer was 1/5 (0.20), and the probability of guessing an incorrect answer was 4/5 (0.80); thus, a student was expected to have guessed an incorrect answer four times more often than he or she guessed a correct answer. Therefore, using the standard correction for guessing, the expected value of the number of points gained due to random guessing was (0.20)(1)+(0.80)( −1/4)=0. In general, for K possible answers per question, −1/(K1) is awarded for an incorrect answer, 0 for a question not answered, and +1 for a correct answer.^{5} This correction for guessing is generally referred to as formula scoring^{5} or as the standard correction for guessing. Formula scoring is a special case^{6} of choice weighting.
The correction for guessing was implemented prospectively in the Oral and Maxillofacial Pathology course at the University of Texas Health Science Center at San Antonio in the academic year 2005–06. The rationale and method of correction for guessing were explained in the course syllabus and by faculty who met with students during the introductory lecture to explain the procedure and answer questions. The correction for guessing in the 2004–05 Oral and Maxillofacial Pathology course was investigated retrospectively.
The results in this report are based on observations made in the Oral and Maxillofacial Pathology course in two different academic years. The only planned difference was that correction for guessing was used in the scoring of multiplechoice examinations. Any other changes between the two years, such as changes in course content or emphasis and changing or repeating of examination questions, were changes that typically occur from year to year.
The Oral and Maxillofacial Pathology course at the University of Texas Health Science Center at San Antonio, given in the spring semester to all secondyear dental students, was fiftyeight hours in length and consisted of fifty hours of lecture and four twohour examinations. Each of the four examinations was divided into two onehour examinations.
The first hour of each examination was based on the presentation of twentyfive clinical cases. Each case consisted of a brief written clinical history and projected clinical, microscopic, and/or radiographic findings. Each student was given a written examination consisting of the brief clinical histories corresponding to the twentyfive clinical cases that would be projected. The instructor projected the first clinical case while the students read the corresponding clinical history. The students were given several minutes to look at the projected clinical case and formulate a written response. After a period of time, as determined by the instructor, the instructor asked the class if any student needed more time before the next clinical case was projected. If even one student raised a hand, more time was given before the next clinical case was projected. When all twentyfive clinical cases were projected, the students were given any remaining time in the onehour block to review their written responses to all of the twentyfive clinical cases and make any changes or corrections. No clinical cases were projected more than once. For each disease taught in the course, students were expected to learn the salient clinical characteristics, etiology/pathogenesis, radiographic features (if appropriate), histopathologic findings, and pertinent treatment options/prognosis. For each of the twentyfive cases in the four examinations, two shortanswer questions were asked for a total of fifty questions. Students were advised to respond succinctly to the shortanswer questions and not to use verbose essaytype responses. Responses to shortanswer questions typically consisted of one or more sentences or several key words. The shortanswer examinations were collected and subsequently graded by the course director (ACJ). The shortanswer questions were graded by identifying key words delineated at the time of construction of the examination. Points were not deducted for spelling errors as long as responses were phonetically correct. If a student gave several answers, only the first answer was evaluated; no partial credit was awarded.
The second hour of each examination consisted of fifty multiplechoice questions, each with one correct answer and four plausible distractors. The multiplechoice questions were a mixture of clinical vignettes and didactic questions. Students were asked to choose the single correct answer for each question. At the end of the second hour, the multiplechoice examination answer sheets were collected and graded electronically.
Since the multiplechoice and shortanswer examinations each consisted of fifty questions, they were equally weighted in the calculation of each student’s final grade. Each of the four twohour examinations comprised 25 percent of the final grade. No comprehensive final examination was given. Students received final course grades based on averages calculated from the scores on the four onehour shortanswer examinations and the four onehour multiplechoice examinations. These averages were used to assign course letter grades as A (90–100 percent), B (80–89 percent), C (70–79 percent), or F (0–69 percent). These arbitrary but commonly used grade cutpoints are used throughout this report to classify students.
Each of the four examinations covered between eleven and thirteen hours of lecture material. When the individual shortanswer and multiplechoice examinations were constructed, the questions were equally weighted to the topics that were presented prior to each of the four examinations. This was to ensure that a given topic was not stressed more often than another topic. The students were advised to add up the number of topics discussed in a given section and divide that number by 50 to arrive at an approximate number of questions per topic on both the multiplechoice and shortanswer examinations.
The General Pathology course at the University of Texas Health Science Center at San Antonio, given in the fall semester to all secondyear dental students, immediately precedes the Oral and Maxillofacial Pathology course. The course was seventyseven hours in length and consisted of sixtyone hours of lecture, four twohour review sessions, and four twohour examinations. The review sessions were structured in a question and answer format. Each faculty member who had previously presented didactic information for the upcoming examination presented a brief verbal review of his or her topic. Students were then allowed to ask questions, and topics were discussed in further detail by the faculty member. This procedure was repeated until there were no further questions. Each of the examinations consisted of seventyfive multiplechoice questions with one correct answer and four distractors; test construction strategies were similar to those described for the Oral and Maxillofacial Pathology course. The multiplechoice questions covered information presented in the lectures and reading assignments. Each twohour examination comprised 25 percent of the final course grade. No comprehensive final examination was given. Students received a final course grade based on the averages calculated from the four twohour examinations. These averages were used to assign course grades using the same categories as in the Oral and Maxillofacial Pathology course. Each of the four examinations covered between thirteen and nineteen hours of lecture material. When the multiplechoice questions were constructed, the questions were equally weighted to the topics presented prior to each of the four examinations. This was to ensure that a given topic was not stressed more often than another. The students were advised to add up the number of topics discussed in a given section and divide that number by 75 in order to arrive at an approximate number of questions per topic.
Ninety students initially enrolled in the Oral and Maxillofacial Pathology course during the 2004–05 academic year; two students who were failing the course after the completion of three examinations withdrew from school before the fourth examination. Eightyeight students were enrolled in the Oral and Maxillofacial Pathology course in 2005–06. Three students who were repeating the course, one student who did not take the third examination, and one student who took General Pathology out of sequence were excluded from the 2005–06 class. Thus, the analyses presented in this report were based on eightyeight students (2004–05) and eightythree students (2005–06) who completed the General Pathology course prior to enrolling in the Oral and Maxillofacial Pathology course and who subsequently had scores for all four examinations in the Oral and Maxillofacial Pathology course. This study was approved by the Institutional Review Board of the University of Texas Health Science Center at San Antonio.
Means were compared between classes using the ttest for independent samples, and standard deviations (variances) were compared using the Ftest.^{7} If variances were unequal, Satterthwaite’s modification to the ttest was used.^{8} Cumulative relative frequency distributions were used to graphically show distributions of numerical scores. Frequencies in letter grade groups (A, B, C, or F) were analyzed using the chisquare test for independence.^{7} Where low expected frequencies were encountered, an exact procedure was used to obtain the Pvalue. We classified students based on their letter grade (A, B, C, F) in the General Pathology course (a measure of student aptitude) and then analyzed their numerical scores in the Oral and Maxillofacial Pathology course by these classifications and class year (retrospective or prospective correction for guessing, that is, the treatment) in a twoway analysis of variance^{7} to investigate aptitudetreatment interactions.^{9}
Aggregate agreement^{10} of scores from multiplechoice examinations and scores from shortanswer examinations was assessed using principal component lines^{11} estimated from the variancecovariance matrix. The first principal component is the line through the means (X, Y) , which minimizes the sum of the squared distances of the data points to the line.^{12} We used principal components analysis because both the X and Y variables were random variables. In linear regression analysis, only the Y variable is considered to be a random variable, and the estimator of the line is biased if X also is a random variable.^{13} Thus, the first principal component lines more accurately estimate the relation between these X and Y variables.
A bootstrap procedure,^{14} with 1,000 samples, was used to estimate confidence intervals for the slope and intercept of the first principal component lines, to test that the slope was 1.00 and the intercept 0.00, and to test equality of the principal component lines for the 2004–05 and 2005–06 classes. The bootstrap is a nonparametric procedure and thus does not depend on any particular probability distribution. The statistic of interest is calculated in bootstrap samples, of the same size as the original, that are generated by sampling with replacement from the original data. Thus, the bootstrap is a resampling procedure. If the resampling is repeated a large number of times, the empirical distribution of the statistic generated from many bootstrap samples approximates the actual distribution. The empirical distribution may be used to construct confidence intervals (95 percent confidence limits are the 2.5 and 97.5 percentiles of the empirical distribution) or perform hypothesis tests.
The Cronbach’s alpha statistic was used as a measure of reliability. Carmines and Zeller^{15} describe the Cronbach’s alpha statistic as an estimate of the expected correlation between one test and a hypothetical alternative form containing the same number of items.
Results
To determine whether the 2004–05 and 2005–06 classes were academically comparable, we compared their respective performances in the General Pathology course taken in the semester immediately preceding the Oral and Maxillofacial Pathology course. The distributions of individual student course scores are shown in Figure 1⇓. This figure shows the cumulative relative frequencies, that is, the fraction of students at a particular grade average and below. Little difference in distributions of grade averages between the 2004–05 and 2005–06 classes in the General Pathology course is indicated. This lack of difference was further reflected quantitatively; the class means were 81.6 ± 6.4 (SD) in 2004–05 and 80.9 ± 5.6 in 2005–06 (P=0.4369). Further analysis also showed there was no significant difference in letter grade distributions (A, B, C, or F) between the 2004–05 and 2005–06 classes (Table 1⇓); that is, the fractions of students receiving A, B, C, or F in the two classes were similar.
The distributions of individual student scores on the shortanswer and multiplechoice examinations for the Oral and Maxillofacial Pathology course are shown in Figure 2⇓. This figure shows that, after correction for guessing was implemented prospectively, the lower part of the distributions of both shortanswer and multiplechoice scores was clearly higher (curve shifted to the right) in the 2005–06 class compared to the 2004–05 class. The class means of the shortanswer examination scores were 82.8 ± 8.6 (SD) in 2004–05 and 84.8 ± 6.4 in 2005–06 (P=0.0895); and the class means of the multiplechoice examination scores were 81.6 ± 7.7 in 2004–05 (retrospectively corrected for guessing) and 83.5 ± 6.1 in 2005–06 (P=0.0722). The resultant overall course means were 82.2 ± 7.7 in 2004–05 and 84.1 ± 5.6 in 2005–06 (P=0.0607). The standard deviations of shortanswer examination scores, multiplechoice examination scores, and overall course scores were significantly smaller in 2005–06 than in 2004–05 (P ≤ 0.0387). The shift to higher grades by the lowerperforming students noted in Figure 2⇓ was responsible for both the higher means and the smaller standard deviations in the 2005–06 class relative to the 2004–05 class.
The upward shift in the lower part of the grade distribution in 2005–06 was further examined by comparing the fraction of students in each of the grade categories A, B, C, and F between the two classes (Table 2⇓). There was a significant difference in overall grade (average of multiplechoice and shortanswer scores) distribution (P=0.0100), in shortanswer grade distribution (P=0.0190), and in multiplechoice grade distribution (P=0.0559), between the 2004–05 and 2005–06 classes. Of note, for overall grade, shortanswer grade, and multiplechoice grade there were significantly (P ≤ 0.0305) lower fractions of Cs and Fs and increased fractions of As and Bs, in 2005–06 compared to 2004–05. Clearly, prospective implementation did not adversely affect the students’ grades, but, in fact, resulted in improved student performance in 2005–06 relative to 2004–05.
The relationships between an individual student’s General Pathology course score and his or her subsequent scores on the shortanswer and multiplechoice examinations in the Oral and Maxillofacial Pathology course are shown in Figure 3⇓. The “C” students in the General Pathology course are shown in the shaded areas. Clearly, the subsequent performances in the Oral and Maxillofacial Pathology course by the 2005–06 “C” General Pathology students were higher than the performances of the 2004–05 “C” General Pathology students; this is indicated by the observation that the number of filled circles (representing grades for 2005–06 students) in the shaded area that shifted to A or B was much greater than the number of open circles (representing grades for 2004–05 students) that shifted higher to A or B. To verify this visual impression, we analyzed the numerical scores in the Oral and Maxillofacial Pathology course between the two classes with students classified by their letter grade in the General Pathology course. These analyses showed significant aptitudetreatment interactions^{9} between the letter grade in the General Pathology course and class year (shortanswer, P=0.0447; multiplechoice, P=0.0246; and course average, P=0.0113). Table 3⇓ shows that those students with a C in the General Pathology course had higher average scores (shortanswer, multiplechoice, and overall) in the Oral and Maxillofacial Pathology course in the 2005–06 class compared to the 2004–05 class. This significant difference is reflective of an aptitudetreatment interaction because the difference between classes was observed only in the C students and not in the A or B students—that is, the improvement depends on the student’s ability or aptitude as measured by the grade in the General Pathology course. Furthermore, in 2004–05, 37.8 percent (14/37) of students with a C in the General Pathology course subsequently received a B (no As) in the Oral and Maxillofacial Pathology course, whereas in 2005–06, 58.5 percent (24/41) of students with a C in the General Pathology course received an A or B in the Oral and Maxillofacial Pathology course (P=0.0678). Of the students receiving a B in the General Pathology course, the fractions that improved to an A in the Oral and Maxillofacial Pathology course were similar in the two classes (33.3 percent [12/36] in 2004–05; 27.3 percent [9/33] in 2005–06; P=0.5847).
We were concerned that students in the 2005–06 Oral and Maxillofacial Pathology course might have responded to the imposition of the correction for guessing by seeking information about examinations from prior classes. In this regard, examining the grades that the students achieved on the same questions as well as different questions in the two years showed that the improved performance by the 2005–06 class in the Oral and Maxillofacial Pathology course was not due to information about the examination questions obtained from prior classes (Table 4⇓).
One objective was to assess the effect of the correction for guessing on the validity of the multiplechoice examinations. The agreement of the scores of the individual 2005–06 students on the multiplechoice examinations with the scores on the shortanswer examinations in the Oral and Maxillofacial Pathology course is shown graphically in Figure 4⇓; and the equation for the first principal component line, describing aggregate agreement, is given in Table 5⇓. The line for the trend in Figure 4⇓ is close to the line of equality, indicating good validity of the corrected multiplechoice examination scores. As shown in Table 5⇓, the slope was not significantly different from 1 and the intercept was not significantly different from 0. Also given in Table 5⇓ is the equation for the first principal component line previously reported^{4} from the retrospective application of the correction for guessing. The slope of the line obtained after prospective implementation of correction for guessing was closer to 1 and the intercept closer to 0 than was the line obtained after retrospective application of correction for guessing; however, these differences were not statistically significant.
For the Oral and Maxillofacial Pathology course, the Cronbach’s alpha statistics, measuring examination reliabilities, for the multiplechoice examinations was 0.89 in 2004–05 and 0.87 in 2005–06; for shortanswer examinations, the Cronbach’s alpha statistics were 0.89 in 2004–05 and 0.87 in 2005–06. Thus, reliabilities of the examinations in the 2004–05 and 2005–06 classes were adequate ( ≥ 0.80) as defined by Carmines and Zeller.^{15}
An objective of using the correction for guessing was to encourage students to recognize and admit what they did not know. Eightynine percent (88.9 percent) of students in the 2005–06 Oral and Maxillofacial Pathology course with numerical course scores of 70–79 percent left at least one multiplechoice question unanswered, 56.3 percent of students with scores of 80–89 percent left at least one question unanswered, and 26.7 percent of students with scores of 90–100 percent left at least one question unanswered. Students with course scores of 70–79 percent left a mean of 3.50 multiplechoice questions unanswered (from a total of 200 multiplechoice questions), while students with course scores of 80–89 percent left 2.02 multiplechoice questions unanswered and students with 90–100 percent left 0.60 questions unanswered. The average fractions of questions that the student did not know that were left unanswered (percentage of question left unanswered as a fraction of number left unanswered plus number answered incorrectly) were 9.0 percent, 6.9 percent, and 4.0 percent for students with numerical course scores of 70–79 percent, 80–89 percent, and 90–100 percent, respectively.
Discussion
We were surprised and pleased to learn that lowerperforming “C” students in our General Pathology course significantly improved their grades in the Oral and Maxillofacial Pathology course after we prospectively implemented the correction for guessing. Our interpretation of this aptitudetreatment interaction^{9} is that the improved performance, in both the shortanswer and multiplechoice examinations, was because the 2005–06 students were concerned about the consequences of the correction for guessing, and as a result, they studied more diligently. In doing so, they attained a better overall understanding of the subject matter. Thus, the improved performance represented a positive (behavioral) attitude of the 2005–06 students in response to our raising of the bar.^{16}
We arrived at the foregoing interpretation of the reason for the improved student performance after implementation of correction for guessing after excluding other potential explanations. The comparability of the examination scores and final grades between the two classes in our General Pathology course supports the premise that the two classes were not academically different and that such a difference did not account for the improved performance in the Oral and Maxillofacial Pathology course by the 2005–06 students. The improved performance on both the same questions and different questions suggests that improved performance in 2005–06 apparently was not because examination questions were passed along from previous classes.
Our objective that students would not only recognize but acknowledge what they do not know was, at best, only modestly achieved as indicated by the relatively small number and fraction of questions left unanswered. There are several potential explanations for these findings. The standard correction for guessing adjusted only for truly random guessing among five possible answers. Thus, it potentially would benefit students to guess if they could eliminate one, two, or three of the distractors, which has been a concern with multiplechoice questions.^{17} Prihoda et al.^{4} previously presented the expected gain per question if a student had partial knowledge and could eliminate one or more of the distractors. One student in the 2005–06 class publicly advised his classmates to continue to guess; his argument was based on expected gain and he did not consider the probability of students guessing themselves into a lower grade.
There was a slight improvement in agreement between multiplechoice and shortanswer scores but little or no change in reliability. The principal component line (that is, the single dimension that best summarizes the data from both multiplechoice and shortanswer examinations) was closer, although not significantly closer, to the line of equality after prospective implementation of the correction for guessing than was the line reported by Prihoda et al.^{4} (Table 5⇑) after retrospective correction. These results continue to support increased validity^{18} due to applying the standard correction for guessing to multiplechoice examination scores. Our use of validity refers to performance without “cuing” in the shortanswer examination. While we cannot claim that the shortanswer examination better evaluates student knowledge based on these data only, we believe this question format, which reduces the influence of guessing, will be a better indicator of what students know or do not know about a given subject. Furthermore, the shortanswer question format as used in the Oral and Maxillofacial Pathology course not only tests knowledge of memorized facts,^{19} but also requires critical thinking to integrate clinical characteristics, radiographic features, and histopathologic findings to arrive at an appropriate diagnosis of various disease processes. We are convinced that a shortanswer examination provides a better measure of a student’s ability to perform in clinical situations than does a multiplechoice examination.
The correction for guessing could be implemented in any course that uses multiplechoice examinations; implementation would be most effective if used in all applicable courses within an institution. In order for the numerical value subtracted for incorrect responses in applying the correction for guessing to be accurate, it is imperative that functional distractors be used, that is, students must not be able to easily eliminate incorrect answers. The shortanswer format examination can be used in any didactic course although grading such examinations requires extensive effort by the faculty.
Conclusions
Prospective implementation of correction for guessing in multiplechoice examinations in an Oral and Maxillofacial Pathology course resulted in significantly better student performance as indicated by improvements in numerical scores and by letter grade distribution on both shortanswer and multiplechoice examinations. Moreover, prospective implementation of correction for guessing resulted in improved validity of multiplechoice examinations. This study supports the premise that these health professions students, qualified through a competitive selection process, respond positively to an increase in expectation for student performance and that health professions faculty should not be reluctant to raise the bar for them.
Acknowledgments
The authors gratefully acknowledge Ms. Belen Ballesteros for her excellent management of the database of student test scores that were used in this study.
Footnotes

Dr. Prihoda is Associate Professor, Department of Pathology; Dr. Pinckard is Professor, Department of Pathology; Dr. McMahan is Professor, Department of Pathology; Dr. Littlefield is Director, Academic Center for Excellence in Teaching; and Dr. Jones is Professor, Department of Pathology—all at the University of Texas Health Science Center at San Antonio. Direct correspondence and requests for reprints to Dr. Anne Cale Jones, Department of Pathology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, TX 782293900; 2105674122 phone; 2105672303 fax; jonesac{at}uthscsa.edu.
This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.