J Dent Educ. 72(6): 719-724 2008
© 2008 American Dental Education Association
Educational Methodologies |
Short- and Long-Term Effects of Training on Dental Hygiene Faculty Members Capacity to Write SOAP Notes
Mary E. Jacks, M.S., R.D.H.;
Christine Blue, M.S., R.D.H.;
Douglas Murphy, Ph.D.
Key words: calibration, dental hygiene faculty, SOAP notes
Submitted for publication 06/13/07;
accepted 02/19/08
 |
Abstract
|
|---|
Calibration among faculty is challenging to achieve and maintain. In this study, calibration refers to the training process by which standardization of chart documentation in a SOAP note format was achieved. In the SOAP format, chart entries by health care providers are written in the following categories: Subjective data, Objective data, Assessment, and Plans. The primary training "effect" or outcome that was measured in this study was the capacity of faculty members to write a SOAP note that adhered to prescribed standards for chart documentation. This study was conducted to assess the short-term effects of training and determine whether faculty members capacity to write appropriately constructed SOAP notes could be sustained for one year. Eight dental hygiene faculty members at the University of Minnesota participated in a pre-training assessment in which they prepared a SOAP note based on a patient case, completed a training session on writing SOAP notes, and completed a post-training test shortly after training that also consisted of writing a SOAP note based on a patients case. One year later, a follow-up test, similar to the pre- and post-tests, was conducted. Each component of the SOAP note was compared and scored against a gold standard benchmark score of 29 that represented the number of items that should have been included in an ideal SOAP note in the estimation of the investigators, based on chart documentation guidelines of the University of Minnesota Dental Hygiene Division. The mean score for the pre-test was 18.25 (SD=2.82), which represented 63 percent of the benchmark gold standard score of 29. The post-test mean score immediately after training was 24.63 (SD=2.13; 84.9 percent of the benchmark score), and the one-year follow-up mean score was 22.75 (SD=1.83; 78.4 percent of the gold standard benchmark). From the pre-test to the post-test administered in close approximation to the SOAP note training, faculty members approximation of the gold standard benchmark increased by 35 percent, or 6.28 points, and from the post-test to the follow-up test one year subsequently, approximation of the benchmark score decreased by approximately 1 percent or 1.88 points. Friedmans test indicated that the differences in mean scores for the pre-test, post-test, and follow-up test were significant. The Sign test was used for post hoc tests; alpha was adjusted using Bonferronis procedure. Conclusions support a hypothesis that faculty capacity to write a SOAP note that adheres to standards can be increased through training and that the effects can be maintained over a period of approximately one year.
One of the greatest challenges in dental and dental hygiene clinical education is achieving faculty calibration, or agreement among faculty, during student performance evaluations. Inconsistent clinical evaluation among faculty has long been a source of frustration for students since lack of agreement among faculty can negatively affect students learning outcomes. If grades are inaccurately or unfairly assigned, student morale may suffer, and motivation to improve skills may decline. A study of three dental schools revealed that the lack of calibration among dental educators was "a significant source of trouble, worry, and discomfort; a major source of anger; and one of the primary reasons for abandonment of a quest for excellence and resignation to just getting by."1 Lanning et al. reported that students believed that differences between their instructors affected their clinical progress. These authors found that students may adopt "faculty-specific" strategies for addressing clinical problems instead of striving to meet the standard criteria.2 Haj-Ali and Feil asserted that students rely on faculty for feedback on their clinical performance and use this feedback "to make appropriate alterations in the next attempt in order to achieve a higher level of performance."3 If consistent and reliable evaluation is absent in the clinic environment, students become confused as to the standard of performance expected, and progress toward competency can be delayed. In addition, the ability and motivation to self-evaluate, a skill necessary for lifelong learning, may disappear when the student is confronted with contradictory feedback from faculty. Calibration in clinical evaluation can also be a source of frustration for faculty, especially new faculty who strive to teach effectively and grade reliably.
Studies assessing inter-rater agreement among medical and dental faculty have yielded poor results. Biller and Kerber studied the reliability of scaling error detection among dental hygiene instructors and found that 64 percent of the grade given was due to differences among several instructors rather than the student performance.4 Lanning et al.2 found substantial variation among instructors in radiographic interpretation, diagnosis, and treatment planning for common periodontal diseases. Other authors have reported inconsistent agreement in treatment decision making regarding carious lesions.5–8 Pippin and Feil found that inter-rater consistency in the detection of subgingival calculus following scaling showed modest to poor reliability. Examiners found residual calculus on 18.8 percent of identified root surfaces, compared to 57 percent found with microscopic evaluation of the same surfaces. Kappa values, used to estimate inter-rater reliability, ranged from .05 to .27. These authors concluded that, if faculty define the terms for evaluation and participate in training, consistency among raters would improve.9
For this study, the term "calibration" refers to the training process by which standardization among evaluators is achieved. The goal of calibration training is to reach a point at which 1) all raters provide evaluations that are within an acceptable range of a benchmark or gold standard; 2) different evaluators provide ratings that are similar to each other, within a prescribed range, when observing or critiquing students performance; and 3) the evaluators are internally consistent in that they rate the same performance in a similar manner each time it is observed. Thus, the outcome of a calibration activity is to obtain acceptable levels of inter-rater agreement and intra-rater consistency. What is not known is how long the effects of calibration can be sustained. Haj-Ali and Feil concluded that, with calibration training, evaluators agreement with a gold standard can improve and such improvement is reasonably resistant to deterioration after ten weeks.3
The University of Minnesota (U of M) uses the SOAP format for keeping accurate patient records. In the SOAP format, chart entries by health care providers are written in the following categories: Subjective data, Objective data, Assessment, and Plans in an effort to improve the standardization of documentation and to enhance uniformity of information that is recorded in the patients chart as well as providing a structure that hopefully will encourage systematic evaluation of the patients health status leading to appropriate therapy. At the U of M, chart documentation protocols were established by faculty and published in a clinic manual for faculty and student reference; this document serves as the gold standard or benchmark for chart documentation. Traditionally, the clinic director has the responsibility of ensuring that the documentation standards are followed as closely as possible. Dental hygiene faculty at the U of M are responsible for ensuring that students record accurate treatment information in their patient charts. Faculty are also required to sign the chart as legal proof that the student was supervised during clinical care and that the documented treatment was provided. However, annual chart audits found that benchmark standards were not being maintained and faculty were inconsistent in their expectations for student SOAP note entries, which produced unacceptable variations among faculty in how students documentation was assessed, which, in turn, produced concerns and uncertainty among the students. To address this situation, training workshops were conducted to enhance faculty capacity to write SOAP notes themselves for purposes of improving consistency in evaluating students chart entries. To measure the outcomes of the SOAP note workshop, the study reported here was conducted to assess the short-term effects of training (comparing performance immediately before and after training) and determine whether faculty members capacity to write appropriately constructed SOAP notes could be sustained for one year.
 |
Materials and Methods
|
|---|
Prior to the beginning of the fall term, faculty in the Division of Dental Hygiene at the U of M were given a fictitious clinical case and asked to write a SOAP note entry that would reflect the initial patient appointment. Twelve faculty members submitted pre-training SOAP notes to their program director. The information was then transferred electronically to the lead author for evaluation. Each faculty SOAP note was compared to the SOAP note criteria, as listed in Table 1
, that were defined in the U of M clinic manual. The criteria in Table 1
served as the benchmark gold standard used for assessment of the SOAP notes written by participants in this study. Each correct item was given a one-point value with twenty-nine possible points. The scores for the SOAP notes developed by the eight faculty members who participated in all tests represent the pre-training test scores.
View this table:
[in this window]
[in a new window]
|
Table 1. SOAP note documentation criteria as described in the University of Minnesota Division of Dental Hygiene clinic manual
|
|
One week later, approximately twenty faculty members attended the first face-to-face training workshop, including the eight faculty members who participated in the pre-test. The conference was conducted by author MJ, who was not associated with the U of M and has more than ten years experience in leading faculty training and calibration workshops. During the initial training session, faculty members were encouraged to discuss the criteria for each item within the SOAP note until all agreed. At the conclusion of the workshop, U of M faculty were asked to complete a new SOAP note on the same case. These SOAP notes were scored in the same manner as the pre-test, by author MJ. Scores from the eight faculty members who participated in the pre-test are reported as the post-training test.
One year later, to assess the long-term effects of training, U of M dental hygiene faculty were asked to write a third SOAP note on the same case. These SOAP notes, referred to as the follow-up test, were scored by author MJ in the same way as the pre- and post-training tests, by comparing faculty responses to the gold standard. The next day, a second face-to-face SOAP note workshop was conducted by author MJ. About twenty participants asked questions and discussed various aspects of the documentation protocol. The results of the pre- and post-training and the one-year follow-up test are reported for the eight faculty members who participated in all phases of the study.
 |
Results and Discussion
|
|---|
The means, standard deviations, and accuracy percentage scores, compared to the gold standard, were calculated for the eight faculty members who participated in the pre-training, post-training, and follow-up tests and attended the SOAP note workshops (Figure 1
). The mean score for the pre-test was 18.25 (SD=2.82), which represented 63 percent of the benchmark gold standard score of 29. The post-test mean score immediately after training was 24.63 (SD=2.13), 85 percent of the benchmark score, and the one-year follow-up mean score was 22.75 (SD=1.83), which was 78.4 percent of the gold standard benchmark. From the pre-test to the post-test administered shortly before and after the SOAP note training, faculty members approximation of the gold standard benchmark increased by 35 percent, or 6.28 points; and from the post-test to the follow-up test one year subsequently, approximation of the benchmark score decreased by about 1 percent or 1.88 points. Friedmans test was used to examine differences among performance on the pre-training, post-training, and follow-up tests. The test indicated significant differences in central tendencies (
2=13.87, p=.001). To determine which differences were significant, post hoc tests were conducted using the Sign test; alpha was adjusted using Bonferronis procedure (adjusted
=.017). Results of the Sign tests indicated significant differences between the pre- and post-training (p=.008) and the pre-training and follow-up (p=.008), but not between the post-training and follow-up (p=.125). These data are presented in Table 2
.

View larger version (33K):
[in this window]
[in a new window]
|
Figure 1. Means, standard deviations, and accuracy percentages of the pre-, post-, and follow-up SOAP note training
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. Means, standard deviations, and accuracy percentages for pre-training, post-training, and one-year follow-up (N=8)
|
|
This pattern of findings indicates that the training efforts were successful in improving faculty performance following the pre-training and that those gains were maintained one year afterward. It is also notable that standard deviations for total means decreased from 2.82 for the pre-training to 2.13 for the post-training, to 1.83 for the follow-up, indicating that faculty performance on SOAP notes became less variable as they proceeded through the components of the study. In other words, faculty performed more consistently over the course of the calibration project.
 |
Conclusion
|
|---|
Faculty calibration is a worthy goal for dental and dental hygiene education because it has the potential to improve learning outcomes and to result in more equitable evaluation of students. Previous studies indicate that faculty training and calibration can increase agreement among faculty.1,10–13 In our study, faculty accuracy (ability to approximate defined documentation standards) in writing clinical SOAP notes improved significantly by 35 percent from the pre- to the post-training tests. In addition, the follow-up test revealed only a 1 percent or 1.88 point decline in SOAP note scores one year later.
These findings support the conclusion of Haj-Ali and Feil.3 If faculty are calibrated using a gold standard and that gold standard is taught to students, students know what is expected and are more likely to make progress toward competency. The results of our study demonstrate that the enhancement of faculty capacity to write SOAP notes that approximate benchmark gold standards was maintained up to a year after the initial workshop. More research is needed in this area to determine at what frequency faculty recalibration should occur.
 |
Author Information
|
|---|
Prof. Jacks is Assistant Professor, Director of Advanced Dental Hygiene Degrees, Department of Dental Hygiene, University of Texas Health Science Center at San Antonio; Prof. Blue is Director and Assistant Professor, Division of Dental Hygiene, School of Dentistry, University of Minnesota; Dr. Murphy is Associate Dean, School of Allied Health Science, University of Texas Health Science Center at San Antonio. Direct correspondence and requests for reprints to Mary E. Jacks, University of Texas Health Science Center at San Antonio, Department of Dental Hygiene, MSC 6244, 7703 Floyd Curl Drive, San Antonio, TX 78229-3900; 210-567-8837 phone; 210-567-7743 fax; jacks{at}uthscsa.edu.
This project was funded by faculty development funds from the University of Minnesota.
 |
REFERENCES
|
|---|
- Fuller JL. The effects of training and criterion models on interjudge reliability. J Dent Educ 1972; 36(4):19–22.[Medline]
- Lanning SK, Pelok SD, Williams BC, Richards PS, Sarment DP, Oh TJ, McCauley LK. Variation in periodontal diagnosis and treatment planning among clinic instructors. J Dent Educ 2005; 69(3):325–37.[Abstract/Free Full Text]
- Haj-Ali R, Feil P. Rater reliability: short- and long-term effects of calibration training. J Dent Educ 2006; 70(4):428–33.[Abstract/Free Full Text]
- Biller IR, Kerber PE. Reliability of scaling error detection. J Dent Educ 1980; 44(4):206–10.[Abstract]
- Espelid I, Tveit AB, Fjelltveit A. Variations among dentists in radiographic detection of occlusal caries. Caries Res 1994; 28(3):130–6.
- Mileman PA, Pudell-Lewis, van der Weele LT. Variation in radiographic caries diagnosis and treatment decision making among university teachers. Community Dent Oral Epidemiol 1982; 10(6):329–4.[Medline]
- Rytomaa I, Jarvinen V, Jarvinen J. Variation in caries recording and restorative treatment plan among university teachers. Community Dent Oral Epidemiol 1979; 7(6):335–9.[Medline]
- Bader JD, Shugars DA. Agreement among dentists recommendation for restorative treatment. J Dent Res 1993; 72(5):891–6.[Abstract/Free Full Text]
- Pippin DJ, Feil P. Interrater agreement on subgingival calculus detection following scaling. J Dent Educ 1992; 56(5):322–6.[Abstract]
- Knight GW. Toward faculty calibration. J Dent Educ 1997; 61(12):941–6.[Medline]
- Abbas F, Hart AAM, Oosting J, van der Velden U. Effect of training and probing force on the reproducibility of pocket depth measurements. J Periodontal Res 1982; 17:226–34.[Medline]
- OConnor P, Lorey RE. Improving interrater agreement in evaluation in dentistry by the use of comparison stimuli. J Dent Educ 1978; 42(4):174–9.[Abstract]
- Courts FJ. Standardization and calibration in the evaluation of clinical performance. J Dent Educ 1997; 61(12): 947–9.[Medline]