Abstract
Dental educators intend to promote integration of knowledge, skills, and values toward professional competence. Studies report that retrieval, in the form of testing, results in better learning with retention than traditional studying. The aim of this study was to evaluate test-enhanced experiences on demonstrations of competence in diagnosis and management of malocclusion and skeletal problems. The study participants were all third-year dental students (2011 N=88, 2012 N=74, 2013 N=91, 2014 N=85) at New York University College of Dentistry. The 2013 and 2014 groups received the test-enhanced method emphasizing formative assessments with written and dialogic delayed feedback, while the 2011 and 2012 groups received the traditional approach emphasizing lectures and classroom exercises. The students received six two-hour sessions, spaced one week apart. At the final session, a summative assessment consisting of the same four cases was administered. Students constructed a problem list, treatment objectives, and a treatment plan for each case, scored according to the same criteria. Grades were based on the number of cases without critical errors: A=0 critical errors on four cases, A−=0 critical errors on three cases, B+=0 critical errors on two cases, B=0 critical errors on one case, F=critical errors on four cases. Performance grades were categorized as high quality (B+, A−, A) and low quality (F, B). The results showed that the test-enhanced groups demonstrated statistically significant benefits at 95% confidence intervals compared to the traditional groups when comparing low- and high-quality grades. These performance trends support the continued use of the test-enhanced approach.
- dental education
- clinical competence
- orthodontics
- competency-based education
- assessment
- test-enhanced learning
In the past decade, assessments have taken a central place in education—both K-12 and higher education including health professions.1 Assessments have traditionally been used to draw inferences concerning students’ knowledge and/or skill, to guide instructional and curricular decisions, and to serve as a basis for grading.2,3 The effect of taking tests has been noted over time from various disciplines and perspectives, with varying explanations and inferences.1,3
In education, a common approach is to begin with a diagnostic pretest and to conclude a course of instruction with a posttest.3 The idea is to infer the benefit of the instruction. Psychometricians, however, have discovered that even without instruction, results improve.4 From a research perspective, testing has thus been considered a “threat to validity.”3 Educators saw the same phenomenon in pre-and posttesting (the reactive effect) and offered reasonable explanations for the noted improvement: having taken a pretest, students are sensitized to the test taking experience.3 The pretest spurs student interest, especially in deficient areas. Students are more likely to go home, read up, and fill in gap areas.
Depending on how assessments are used, they may be beneficial or counterproductive to learning. The usefulness of testing is related to the nature of the assessment itself.3 Useful assessments are connected to real world, meaningful targets of instruction (in contrast to checking off boxes or searching for a correct answer).2 Cognitive psychologists have recognized that repeated testing enhances learning and retention of material.1,5 This phenomenon, the “testing effect,” was recognized as early as 1922.6 The benefit to learning is modified by contributing factors such as frequency of testing, timing of feedback, and types of questions.7 Butler et al. found that a delayed feedback of one day was more effective than immediate feedback when students were retested a week later on a final cued-recall test.8 The types of questions asked on tests also seem to contribute to learning. Test questions may be divided into types: recognition or selected response (multiple-choice, true/false, etc.) and production or constructed response (short answer, essays, etc.). Previous research found that learning retention is improved with production-type questions that require more active retrieval of information from memory.9,10
In The Art of Changing the Brain, James E. Zull tied learning theory to neuroscience.11 He described Kolb’s cycle of experiential learning as it relates to brain/neuronal structure and dynamics. Zull described active testing as part of the learning process. For deep learning to occur, ideas must be tested. In Zull’s construct, active testing flows naturally from a process of constructing understanding of experiences and includes discussion, writing, living, and experimenting. Active testing is a physical process. Through testing, ideas change from the abstract to the concrete. Despite different perspectives and inferences, all of these disciplines saw a benefit in testing: scores improved and knowledge was retained better.
Dental educators are confronted with challenges ranging from an increasing curricular load requiring students to learn and demonstrate knowledge and skills in various disciplines; integration of knowledge, skills, values in patient care; and assessing performance relative to a baseline level of competence.12 Dental educators are searching for efficient and effective ways to promote learning and retention. The goal is to bridge classroom or online learning experiences with clinical applications. Any technique or strategy that can bridge the gap between initial learning and future application will save time and resources needed for future relearning.13 Could repeated testing with feedback be useful in competency-based education?
Initially, retrieval research focused on matching, word-pairs, and short answer tests. These studies took place in a laboratory, not in a real classroom setting.13–15 Recent investigations that examined the effects of test-enhanced learning beyond simple memory and recall have ventured outside of the laboratory. Reported benefits of test-enhanced learning have been replicated in medical, dental, and dental hygiene programs.16–18 Larsen et al. conducted a randomized controlled trial that identified benefits of test-enhanced learning with medical residents in a real classroom setting.17 That research was followed by a study demonstrating that testing positively affected resident performance on exams with simulated patients.19 Kromann et al. similarly reported that test-enhanced learning positively affected skill acquisition in a mannequin CPR training session.20 Reports regarding test-enhanced learning in the dental literature have been sparse when compared to the medical literature. Baghdady et al. focused their research on the diagnosis of pathology using dental hygiene students in a tightly controlled laboratory setting, while Jackson et al. concentrated on how the testing/retrieval effect influenced dental students’ web-based learning.16,18
Prior researchers have not addressed test-enhanced learning effectiveness relative to competence in dental education. The aim of this study was to evaluate the impact of test-enhanced experiences on higher order cognitive processes (analyzing, synthesizing, critical thinking, problem-solving, decision making) relative to demonstrations of competence in diagnosis and management of malocclusion and skeletal problems. This four-year study compared a traditional instructional method (presentation and in-class activities) to a test-enhanced method that incorporated a series of case-based assessments with written and dialogic feedback. Although many studies have examined the effect of retrieval on academic performance and retention of information, medical and dental education research is only beginning to analyze how the testing effect can translate to clinical competence. The question remains: will the documented benefits of test-enhanced learning/retrieval affect demonstrations of professional competence? This study addressed this question by contrasting a traditional instructional method (lecture presentations and classroom exercises) with a test-enhanced approach based on a series of formative assessments with written and dialogic feedback.
Methods
The study was deemed exempt by the New York University Institutional Review Board (#13-9723). Four groups of third-year (D3) dental students participated in the study as part of their orthodontics seminar course at New York University College of Dentistry (NYUCD) (2011 N=88, 2012 N=74, 2013 N=91, 2014 N=85). Aggregate averages of undergraduate GPAs and science GPAs for incoming dental students corresponding to student groups for each year in the study were obtained from the NYUCD admissions office and were comparable.
All students were enrolled in six two-hour sessions, spaced one week apart, and a self-study component. Key enabling knowledge presented in the first- and second-year curricula was reviewed and applied in the context of clinical case simulations. Evaluative criteria, the basis for assessment and instruction, were applied in each session. Details of the course, including curriculum, format, resources, attendance, and grading policies, have been previously reported.21,22 The traditional approach, TA (2011, 2012), emphasized lecture presentations and classroom exercises, while the test-enhanced approach, TEA (2013, 2014), emphasized formative assessments with written (comments, grades, emoticons) and dialogic (discussion) delayed feedback. Since the course occurred on a weekly basis, feedback was given in the following session.
Formative and summative assessments were based on clinical simulation cases. Each case consisted of a history and images: three facial photographs, five intraoral photographs (maxillary and mandibular occlusal views and maximum intercuspation from right side, left side, and frontal), a panoramic radiograph, and a cephalometric radiograph (Figure 1). The assessment entailed constructing a problem list, treatment objectives, and a treatment plan based on a case.
Example of images used in clinical cases
Note: Images are clockwise from top left: facial profile, facial frontal smiling, facial frontal in repose, lateral cephalometric radiograph, mandibular occlusal, left maximum intercuspation, frontal maximum intercuspation, right maximum intercuspation, maxillary occlusal. Panoramic radiograph is at center.
A demonstration of competence equaled zero critical errors. Critical errors included failure to recognize or misrepresentation of any of the evaluative conditions. Critical errors were categorized as primary (treatment planning) and secondary (diagnostic). Primary errors included failure to recognize care/referral required or suggesting inappropriate or unnecessary care/referral. After screening for primary errors, problem lists were closely reviewed for diagnostic (secondary) errors relative to the evaluative criteria. Grading of assessments was described in greater detail in previous articles.21,22
At the final session, a summative assessment consisting of four cases was administered to the groups. Regardless of instructional approach (TA or TEA), the summative assessment consisted of the same cases for groups compared. Grades were scored according to the same evaluation criteria (the basis for critical errors) by a single grader. Grades were based on the number of cases without critical errors: A=0 critical errors on four cases, A−=0 critical errors on three cases, B+=0 critical errors on two cases, B=0 critical errors on one case, F=critical errors on four cases. Grades were categorized as high quality (A, A−, B+) or low quality (B, F). Considering that a B grade means three out of four cases had critical errors, we decided that this was a poor predictor of true competence.
Data for the TA and TEA groups were collapsed and compared across four years. The data included 1) composite distribution of evaluative outcomes based on four cases used in summative assessments; 2) pass rates on summative assessments (all except F grades); 3) low vs. high quality grades; 4) performance rates on each case with percentage of students demonstrating competence; and 5) distribution of grades for groups taking the same summative assessment. All rates were calculated at 95% confidence intervals using vasserstats.net.
Conditions governing the administration of summative assessments were designed to ensure the integrity of the examination and discourage student dishonesty. Summative assessment cases were not available to students in a digital format (online, PowerPoint, etc.). The cases were in exam books, counted before and after each assessment session. Assessment responses were not returned to students and were maintained in a secure location. Since the course is repeated four times across the academic year with different cases appearing on summative assessments, it was not possible for students to know which cases would appear on the assessment from year to year.
Results
All students in the four specified years participated in the study. A summary of the grade distribution over all four years prior to being combined for analysis appears in Table 1. Groups were determined to be homogeneous to allow for data consolidation. There was no significant difference between TEA and TA groups in overall pass rates (Figure 2). When cases were individually analyzed, we found the TEA group had significantly higher competence rates for cases B and C. Similar trends were observed in the follow-up study, with different groups of students and assessment cases.
Pass rates for traditional (blue) and test-enhanced (red) groups, overall and by case
Note: Panel A shows comparison for the two groups overall. Panel B shows comparisons for individual cases. The test-enhanced group showed statistically significant improvement on cases B and C (indicated by * on graph). Both graphs include 95% confidence intervals.
Summary of rates for each letter grade for 2011–14 with 95% confidence intervals
TEA showed a significant increase in high quality grades (Figure 3). TEA demonstrated significant increases in A− grades (0 critical errors in three of four cases) and B+ grades (0 critical errors in two of four cases) and significant decreases in B grades (0 critical errors in one of four cases). A composite of possible outcomes (Figure 4) for the four assessment cases shows favorable outcomes for the TEA group: increased zero critical errors and a decrease in primary and secondary errors.
Difference in high- and low-quality and summative grades between traditional (blue) and test-enhanced (red) groups
Note: Panel A shows high- versus low-quality grades with a significant difference (*) between the two groups for both. Panel B shows distribution of summative assessment grades, with significant increases (*) in A− and B+ grades and significance decrease (*) in B grades. Both graphs include 95% confidence intervals.
Composite of evaluative outcomes for traditional and test-enhanced groups based on same four cases in summative assessments
Discussion
In this assessment we emulated the actual experience of reviewing patient records and formulating a diagnosis and treatment plan, closely mirroring cognitive processes. We changed the instructional methodology of the course to a test-enhanced approach based on evidence suggesting that combining assessments with feedback facilitates learning and achievement.
In this study, the instructional target was a complex behavior: diagnosis and treatment planning involving higher order cognitive skills (analysis, synthesis, evaluation, critical thinking, logic/reason, problem-solving, decision making, and ethical judgment). It is difficult to measure these attributes. We can, however, observe group performance on the summative assessment. The test-enhanced group performed better than the traditional group, with more consistent demonstrations of competence: more demonstrations of zero critical errors and fewer primary and secondary errors. Although pass rates were essentially the same, the quality of overall grades was higher. Notably, the percentage of students receiving A− grades increased by more than two confidence intervals. To look at this another way, the number of students who received zero critical errors on three of four assessment cases approximately tripled.
Although the test-enhanced group’s overall performance was better than the traditional group, the effect was not experienced uniformly across cases. Confounders that need to be explored include assessing the difficulty of each case. For example, both cases B and C had a significant decrease in primary errors and a significant increase in students’ demonstrating competence; however, less than 15% of test-enhanced students demonstrated competence in case C, while 60% demonstrated competence in case B. The study was repeated across the same years with different D3 groups receiving summative assessments based on four cases (different from those in this study). This was done to better gauge the instructional effect. In the follow-up study, similar trends were observed, but it was harder to demonstrate statistically significant benefits. Again, this finding calls into question the variable difficulty of individual cases. These questions will be addressed in future studies as we intend to investigate performance patterns in order to construct summative assessments of equivalent difficulty.
This study occurred under live real-world classroom conditions and had limitations. From an experimental design perspective, there were no control and test groups for each year. Students in our college cannot be randomized and sorted into the same course with different instructional methods. Also, it was impossible to know what students were experiencing outside the classroom or what study methods or external factors affected performance on the summative assessment. Although the evaluative criteria, cases on the assessment, and the grader were constant throughout the study, it lacked protocols to ensure calibration in grading. To counter this, assessments were re-graded to ensure the same standards were applied across the years studied.
Conclusion
Since changing the course to a test-enhanced instructional method, we observed improved outcomes in terms of students’ being able to diagnose and treatment plan patients with malocclusion and associated skeletal problems. This study supported the continued use of the test-enhanced instructional method and additional research to clarify potential confounders (case difficulty, grading protocols, and external factors) in order to validate these findings.
Acknowledgments
We thank Dr. Mal Janal for his statistical consultation and Eileen Rosa and Nhung Quan for the organization and support that made this project possible.
Footnotes
Disclosure
No disclosures were reported by the authors.
REFERENCES
This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.