Abstract
The development and dissemination of meaningful and useful performance reports associated with examinations involved in the licensure process are important to the communities of interest, including state boards, candidates, and professional schools. Discussions of performance reporting have been largely neglected however. The authors recognize and reinforce the need for such discussions by providing prototypes of performance reporting in dentistry with examples and recommendations to guide practice. For illustrative purposes, this article reviews and discusses the different reporting models used over the past ten years with Part I and Part II of the National Board Dental Examination (NBDE). These reporting models are distinguished by such features as the following: 1) scores in each discipline covered on the exam (four for Part I and nine for Part II) and an overall average are reported in a standard-score metric; 2) a single overall score in a standard-score metric is reported; and 3) performance on the exam is reported as pass/fail. Standard scores on the NBDE range from 49 to 99, with 75 being a passing score. Sample data, without identifying information, are used to illustrate the reporting models.
- dental education
- licensure
- assessment
- performance reports
- reporting models
- National Board Dental Examination
While the testing industry has largely focused on addressing the psychometric challenges associated with examinations, it has paid little, if any, attention to test performance reporting.1,2 Given the impact of test performance on decision making, developing reports of exam performance that are useful and meaningful becomes very important to the communities of interest. The Standards for Educational and Psychological Testing (Standards 5.0–5.7 and 6.10–6.16) state clearly that when exam results are released, those responsible should provide appropriate interpretations.3 Few studies related to licensure exams build a framework to support or justify the performance reporting model.4
In this article, we recognize and reinforce the need for such discussions by providing prototypes of performance reporting in dentistry with examples and recommendations to guide practice and professional schools. We begin with a look at the primary mission of the Joint Commission on National Dental Examinations (JCNDE), a mission that directed the exam format and initial reporting models. Next, we address the need of providing information to dental schools and candidates regarding performance on the exams, a secondary factor that drove the history of performance reporting. Finally, we present different reporting models and comment on the pros and cons of these models.
The JCNDE
The JCNDE is the agency responsible for the development and administration of the National Board Dental Examination (NBDE) and the National Board Dental Hygiene Examination (NBDHE). The mission of the JCNDE is to develop and conduct highly reliable, state-of-the-art cognitive exams that assist regulatory agencies in making valid decisions regarding licensure of oral health care professionals; to develop and implement policy for the orderly, secure, and fair administration of its exams; and to serve as a leader and resource in assessment for the oral health care professions.5–7
Although state boards of dentistry are the primary beneficiaries of the board exams, other stake holders derive secondary benefits from candidate performance. They represent a mirror of institutional and individual achievements. Once board exams moved from a graded to a pass/fail format in 2012, candidates and institutions could no longer glean additional information from student performance. As observed by Haladyna and Kramer, other stakeholders such as educational institutions and candidates needed more than pass/fail information for the purposes of improving teaching, remediation, and learning.8 Most institutions and candidates needed to know the areas of weakness and strength and ways to improve both educational goals and candidate performance.
As a result, the JCNDE has developed a variety of reporting models to address some of the needs of educational institutions and candidates. These reporting models are distinguished by reporting of a standard score in each of four and nine tested disciplines (on Part I and on Part II, respectively); reporting of an overall standard score; and reporting of a single overall pass/fail status. This study takes sample performance reports of dental board exams sent to accredited dental schools for purposes of illustration and discussion.
Reporting on the NBDE
The overall design of the reporting models for the NBDE involves four elements: 1) the exam (Part I and Part II); 2) the nature of the exam (battery and comprehension); 3) the frequency of distributing school reports (monthly and annually); and 4) the reporting models. A summary of the overall design is shown in Table 1.
Reporting models for school performance on NBDE Parts I and II
Scoring and Reporting
On the NBDE, a score of 1 is awarded for a correct response, while a score of 0 is assigned to an incorrect response. There is no penalty for an incorrect response. A candidate’s score is computed by summing the number of correct responses (raw score) and then converted to a standard score, which adjusts for minor differences in difficulty across test forms. The Rasch model is used to scale raw scores and to make the appropriate adjustments.
The standard scores range from 49 to 99, with a score of 75 representing the minimum passing standard score for all test forms administered to the candidates. Because NBDE Parts I and II are criterion-referenced exams, the minimum passing score is determined by experts through standard setting activities.9,10 The standard is reset every five to seven years. A candidate’s pass/fail status is determined by his or her standard score on each discipline for the battery exam and overall for the comprehensive exam. Detailed technical information and research related to the standard setting procedure used with Parts I and II were reported by Kramer and DeMarais and Tsai et al., respectively.9,10
Nature of the Exam
Before 2007, Part I was a battery comprised of individual exams in four major disciplines: 1) anatomic sciences, 2) biochemistry-physiology, 3) microbiology-pathology, and 4) dental anatomy and occlusion.5 There were 100 multiple-choice items for each discipline. Each item consisted of a stem with a list of possible options. The stem was either a question or an incomplete statement. Only one of the options was considered the correct or best option.
Beginning in 2007, Part I became a comprehensive exam.6 The characteristics of measurement of the “comprehension” are that all 400 items are intermingled across the four disciplines. Part II still consists of discipline-based and case-based components.7 There are 400 items in the discipline-based component. These items are derived from nine disciplines: endodontics, operative dentistry, oral and maxillofacial surgery/pain control, oral diagnosis, orthodontics/pediatric dentistry, patient management, periodontics, pharmacology, and prosthodontics. The case-based component has 100 items based on eight to ten actual patient cases. The patient cases are developed to have approximately 30% to 70% of items related to pediatric vs. adult cases, respectively. A minimum of 15% of the items address the management of medically compromised patients, both adult and children. A compromised patient is defined as a person whose health status requires modification of standard treatment. Each case in Part II consists of a synopsis of a patient’s health and social histories, patient dental charting, diagnostic radiographs, and clinical photographs of the patient when necessary.
Reporting Models
As the testing organization responsible for developing Parts I and II, the JCNDE shares its experiences with other organizations about communicating exam results to the various communities of interest. In the past ten years, the JCNDE has been working with stakeholder groups and has recognized their needs to know about reporting exam results and communicating the limitations of those results. There is a balance between what is considered meaningful and useful information and what constitutes inappropriate or invalid use of exam results.
This article describes the JCNDE’s efforts in developing four models for reporting Part I and Part II results over the past ten years after considering feedback from the communities of interest. One of the target audiences was the educational community. Each model was created at different times to aid dental schools in understanding, interpreting, evaluating, and communicating student performance on Parts I and II so the schools can use the information in reports designed to support instruction, remediation, and, in some schools, for promotion and graduation. This information was provided in concert with the purposes of the exams. Each model is distinguished by one of the following: percentage of correct responses, number of correct answers, standard scores, school ranking according to its school average score and/or pass/fail reporting without numerical information except for failing students, and an aggregate average for the school expressed as standard deviations above or below the national mean.
Examples of school performance reports are presented in this article for purpose of illustration. In consideration of confidentiality, all personal and identifying information has been eliminated.
Parts I and II Monthly Reports Prior to 2007
Table 2 is an example of a monthly school report that would have been prevalent before 2007. It shows Part I results for students taking the exam in a particular month and year, as indicated in the Test Date field, e.g., July 2002 (0702). Standard scores in each discipline and an average score for four disciplines are reported for each student. Also, the total number of students examined who were currently enrolled in the school (students/current graduates) and who had graduated in the past five years (past graduates) were shown in the report.
Monthly report on NBDE Parts I and II before 2007
Student performance on the comprehensive Part II exam for a particular month and year for the same five students is also shown on Table 2. The following information was reported for disciplines and the case-based component: L for low/bottom third, A for average/middle third, and H for high/top third. The overall standard score and the school averages based on students’ overall standard scores were also reported. Such a report would have been provided typically after a candidate participated in the Part II exam. Information on Part I performance alone would have already been provided to schools two years prior.
The presence of a numerical score for each discipline was easy to understand by both candidates and educational institutions. Students knew a score above 90 meant potentially a residency acceptance. Schools also felt that a preponderance of scores above 90 gave them “bragging rights” whether warranted or not. There was little understanding of what constituted a “criterion-referenced” exam and how a score of 75 was valid to compare across all candidates. Moreover, candidates with a score of 89 or one with 91 could not be properly compared because the further away from the 75 passing score one got, the less reliable the score became. Even among well-informed educators, the misconception was that the test score was derived either as a percentage of correct answers or that students were graded on a national curve. Another fact about criterion-referenced exams that was poorly understood was that while scores around 75 were highly reliable and comparable from candidate to candidate, the reliability associated with scores dropped as scores deviated from 75. This was precisely the argument for why the exam had to become pass/fail to avoid using it for invalid purposes, i.e., selecting residents for a specialty program if they scored 90 or above.
The Part II exam initially had numerical scores like the Part I exam, but in 2007 Part II results were reported using three qualitative descriptors of performance: High, Average, and Low score reporting (Table 1). That reporting model was perceived as being less desirable by educators because it was not providing an accurate enough gauge of how a student performed, whether the score was a valid comparison or not. However, when viewed across an entire student population and across a whole discipline, the qualitative assessment gave a good indication if the school or a discipline was performing in the top third, middle third, or bottom third of the overall comparative population.
Part I Profile Report Prior to 2008
Table 3 shows a profile report for 43 of 56 dental schools in existence at the time on their performance on Part I. This was distributed annually to all schools up until 2007. This particular profile report presented the average performance by school and by Part I disciplines for students taking the exam during a 12-month period. The average student performance in standard scores was computed for each school for the four disciplines and then ranked on a scale of 1 to 5, with 5 being the highest quintile. Quintile is a statistical value of a data set that represents 20% of a given population. The first quintile (1) represents the lowest fifth of the data (1–20%); the second quintile (2) represents the second fifth (21–40%); and so on.
School profile on NBDE Part I, 2007
Schools were ranked on average score, average discipline scores, and failure rate. This report typically would also contain a school code for identification purposes that was removed from this sample. Note that the school ranked #1 was ranked with an average failure rate of 0.0 in the top quintile (5). The school ranked #30, in turn, was ranked in the second quintile (from the bottom) in anatomy and in the third quintile from the bottom in all other categories, including a failure rate of 10.9%. This type of report provided schools with actual average scores by disciplines in comparison to other disciplines, the school’s ranking in those disciplines, and a comparison to the other 56 schools. At the bottom of the table, an average score also provided a sense of where individual schools performed compared to the national average by discipline, overall score, and failure rate.
The report in Table 3 compared how a school was ranked in individual disciplines, average score, and failure rate. A discipline score was also provided (valid or invalid as previously explained). In addition, the column next to the individual disciplines provided a context for the ranking of schools in particular disciplines. In one instance, the school could evaluate how its students collectively did that year in a specific discipline compared to normative national data (bottom of the table) usually over a year. Five quintiles were listed with the fifth quintile counter-intuitively representing the top quintile and the first quintile located at the bottom. This reporting model was useful for top schools to give them bragging rights and for those at the bottom to take immediate corrective action.
Part II Profile Report Prior to 2013
Table 4 shows another type of profile report, which was distributed to schools on a yearly basis. This specific profile report presented average performance of schools (only 39 shown in this table) with ten or more students for Part II for students taking the exam during a 12-month period. The performance of each school is shown in rank order by failure rate from the best performing school to the worst. Rept.ID is the school code, a random number that the JCNDE provides. In this table, school codes were replaced with imaginary numbers. The second column contains the rank order of failure rates for the first 39 schools, with the lowest failure rate on top. The column identified as “Total” represents the average percent correct given for each school for the 500 Part II items.
School profile on NBDE Part II, prior to 2013
The remainder of the table shows the average percentage of correct responses for each discipline and for Components A and B. Based on an algorithm, the average score is calculated for each student and in this table for each school, shown in the last column labeled “Score.” For example, the school with ID 22 had a zero failure rate. The school average percentage of correct responses on the entire exam was 73.9%, as shown in the Total column. The school average percentages of correct responses for Component A, Component B, and nine disciplines were 74.4, 72.0, 74.1, 73.3, 75.1, 73.6, 74.4, 82.9, 72.1, 67.8, and 74.8, respectively. This school’s average standard score was 86.5, as shown in the Score column.
The report in Table 4 shows school rank by failure rate and the associated average standard score. This reporting model provided information that was not particularly useful to educators. First, there were two types of numbers: percent scores and standard scores. The mixture of those did not provide an explanation of how the raw score corresponded to the standard score. All it stated was that one could score as low as 30 or 40 in some disciplines and still pass the exam. An explanation of the model and the algorithm used to convert raw scores into standard scores were missing. School administrators understood the standard scores and their comparative performance to other schools and the rank order of what percentage passed or failed. Unfortunately, the raw data were of limited value.
Parts I and II Monthly Reports for 2007–12
Table 5 shows a monthly school report for the comprehensive Part I and Part II exams for the period 2007 through 2012, when they became pass/fail. Because of the nature of the exams, changes were made in this report to address the needs of dental schools, including average number of correct responses for disciplines and components and a total score in standard scores.
NBDE Parts I and II monthly report, 2007–12
“Scores” and “Non-Std Performance” represent the number of correct responses for Part I and Part II disciplines, respectively, and “Std Score” represents the total score in standard scores. Additional information was also provided in this specific report, including school average raw score (i.e., average number correct, relative to the national average for each discipline and school average standard score relative to the national average); number of students who took the exam during the reported period for the school (e.g., students/current graduates, taking the comprehensive or traditional battery of Part I); and number of past graduates who took the exam during the reported period.
Parts I and II Profile Reports for 2007–12
During the period of 2007 through 2012 until the exam transitioned to a pass/fail model, schools also received an annual summary entitled Dental School Profiles. This summary report model as shown in Table 6 and Table 7 provided a comprehensive review of all Part I and separately Part II results for a particular school in comparison to the national average of all candidates during the preceding 12-month period. This report format was generated to address requests from dental schools to allow them to compare their overall performance summaries in comparison to the national performance. The profile includes the following information: the group analyzed (total number of candidates who took the exam from a particular school vs. total students in the country; percent failure rates for the school vs. the rest of the country; average standard score achieved by discipline; and average raw score (i.e., average number correct responses out of 100 administered for each of the four disciplines on Part I and nine disciplines and Component B on Part II).
School profile on NBDE Part I, 2012
School profile on NBDE Part II, 2012
The advantage of this report model was its brevity. It covered only the annual performance of a school in four disciplines. Because there were only four disciplines and 400 questions, each discipline had 100 questions. Therefore, the raw scores (correct answers) were in fact a percentage of 100 questions. The misconception of the score interpretation could be confounding with different types of score reporting, such as raw score, number of correct answers, and percentage correct of 100 questions. For example, educators might interpret a value of 56 as a raw score of 56, as 56 correct answers, or as 56% correct. The standard score was useful. When it came to the report shown in Table 7, it only made sense if the guide and the composition of the exam were available with the discipline-specific breakdown of questions.
Reports in Accordance with Pass/Fail Policy
Monthly Report
Starting with the year 2012 and moving to a pass/fail format, the monthly reports no longer conveyed individual scores, but only a pass or fail result. Table 8 shows a sample monthly school report, currently in use, designed in accordance with the JCNDE’s pass/fail reporting policy. It includes the following information, as shown in “Detail Report”: reporting period (month, year), title of exam, school name (school code), student name, student identification number, pass/fail status, test date, year of graduation, and year of birth.
Sample monthly report for NBDE Part I or II since 2012: pass/fail format
As shown in Table 9, which is part of the monthly report, the following additional information is included if more than ten students completed the examination: 1) a d-value representing the standardized difference between the school’s average standard score and the national average standard score; 2) a d-value representing the standardized difference between the school’s average raw score (i.e., average number correct) and the national average for each of the disciplines covered on the exam; and 3) number of current students/recent graduates and/or past graduates taking the exam. The interpretation of a d-value provided in a cover letter to school is as follows: “A d-value is a standardized value representing the distance between your school’s average and the national average in standard deviation units. A positive d-value of 1.0 indicates that your school average is one standard deviation above the national average. A d-value of −1.0 indicates that your school average is one standard deviation below the national average. A d-value of 0 would indicate that your school’s average falls directly on the national average.”
Summary report for NBDE Part II after pass/fail format introduced in 2012
The report shown in Tables 8 and 9 ended the provision of scores that could be misused. The original intent of the exam programs was restored, however: providing information on the safety of a candidate (pass) or danger for lack of knowledge (fail). Pass/fail was the only valid information that could be reported. It was well known that board scores were used to screen applicants for residency programs. As a result of the absence of a numerical score and to provide useful information regarding students’ abilities, a new test was introduced called the Advanced Dental Admission Test. This exam had its debut in spring 2016. Students who want to be admitted to select residency or specialty programs will be required to take this new test, which will provide program directors with a score potentially useful in the admission process.
The reporting model illustrated in Tables 8 and 9 showing a standard deviation (d score) provides a good gauge of how a school performed compared to the rest of the country. This reporting model is valid and currently in use.
Profile Report
Figure 1 shows a sample school profile report distributed to accredited schools annually in accordance with the pass/fail reporting policy. All the numerical information is shown in a graphical format. Specifically, the d-value by discipline is presented in bar charts. This figure also shows the standard score d-value and the failure rate trend for five years (school vs. national).
Example of NBDE Part II profile report for 2013
Conclusion
During the past decade, the reporting models of students’ performance on the NBDE were the result of a dialog among the JCNDE, licensing boards, the educational community, and candidates. As the format of the exams improved and became computerized, comprehensive, and pass/fail in response to requests from schools, the JCNDE adjusted its reporting format to provide additional information on students for purposes of measuring benchmarks and remediation as shown in the reporting models. Each reporting model provided different values to address and satisfy the needs of schools and candidates.
This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.