|
|
||||||||
Faculty Development |
Key words: radiographic interpretation, periodontology, faculty development, dental faculty, dental hygiene faculty, educational research, student assessment
Submitted for publication 11/07/05; accepted 01/24/06
| Abstract |
|---|
|
|
|---|
Previous work revealed inaccuracy and variability among periodontal and preventive faculty in rating radiographic bone loss.6,7 Radiographic findings are important adjuncts to clinical examinations in establishing periodontal diagnosis, prognosis, and long-term evaluation of the periodontium.8 The position of the alveolar bone crest and its relationship to the tooths cementoenamel junction (CEJ) and apex can be used to determine the linear degree of interproximal bone loss.9 The percent of alveolar bone loss, in conjunction with clinical parameters, is commonly used to determine the presence, degree, and extent of periodontitis.10
Inaccurate and inconsistent assessment of percent bone loss among clinical instructors is particularly problematic in an academic environment. Multiple instructors commonly oversee the diagnosis and treatment of dental school patients. Varied or inaccurate assessment of radiographs could lead to misdiagnosis, over- or undertreatment, or inadequate longitudinal evaluation of patients periodontal conditions. Additionally, clinical instructors are responsible for teaching and assessing students abilities to interpret radiographic findings. Clinical instructors inaccurate and inconsistent evaluations of radiographs may be detrimental to student learning, assessment of student performance, and teaching effectiveness.11
A structured training program may improve accuracy and consistency among clinical instructors ratings of percent bone loss. In this investigation, existing plain film radiographs meeting specific criteria were digitized and displayed by LCD projector. The use of a single method for projecting digitized images offered the advantage of standardized image projection for training large groups. The purpose of this investigation was to determine the accuracy and consistency of clinical instructors ratings of percent bone loss for a series of digitized intraoral radiographic images in conjunction with a structured training program.
| Methods |
|---|
|
|
|---|
Radiographs were prepared for projection by scanning them using a flatbed Microtek ScanMaker 8700 scanner and software ScanWizard Pro 7.0, which used a scanning resolution of 300 pixels per inch. Digitized images were imported into Microsoft PowerPoint and projected via LCD projector using a resolution of 1024 x 768 in a dimly lit room. Two of the authors (SKL and HJT) judged the digitized radiographic images to be of acceptable quality after minor grey scale adjustments.
The "actual" amount of bone loss was determined independently by three of the authors (SKL, HJT, and PSR), as described previously.7 These authors viewed the duplicated plain film radiographs on standard view box separately, without consultation with one another, in an artificially lit room using a Schei ruler to the nearest 5 percent.12 The Schei ruler used was a plastic transparent ruler with a 2 mm thick marking at its margin and a series of equidistant lines radiating from a center point each representing 5 percent bone loss. The 2 mm thick marking was placed on the tooths CEJ, and one of the radiating lines was placed on the tooths apex or most apically positioned apices. The "actual" amount of bone loss was determined by identifying the position of the alveolar bone crest relative to the rulers markings. One discrepancy in rating bone loss occurred among the authors and was discussed until consensus was reached. Twenty-four percent of test teeth had no bone loss, 24 percent had <15 percent bone loss, 28 percent had between 15 and 30 percent bone loss, and 24 percent had >30 percent bone loss. Two of the authors (SKL and HJT) verified the correct choice categories, using the LCD projector and a computer-generated grid that was superimposed on study teeth.
Clinical instructors from the University of Michigan School of Dentistry including full- and part-time dental hygiene faculty, periodontal faculty (periodontists and general dentists), and periodontal graduate students were recruited into this investigation. These faculty members and graduate students will be collectively referred to as "clinical instructors." Clinical instructors simultaneously completed a twenty-seven-item pretest (referred to as pretest 1) immediately prior to a training program on radiographic interpretation (Figure 1
). Question 1 asked clinical instructors to identify themselves as a dental hygiene faculty member, graduate student, or periodontal faculty member. Question 2 asked clinical instructors to describe their years of clinical experience as <5, 510, or >10 years. Questions 327 asked clinical instructors to rate percent bone loss for indicated teeth while simultaneously viewing magnified digitized radiographic images using an LCD projector by selecting one of the following categories: none, <15 percent, 1530 percent, and >30 percent. Choices were based on American Dental Association (ADA) and American Academy of Periodontology (AAP)1315 guidelines as outlined in the schools clinic manual for gingivitis and mild, moderate, and severe periodontitis, respectively. For the purpose of statistical analysis, numbers were assigned to each bone loss category as follows: none=(1), <15 percent=(2), 1530 percent=(3), and >30 percent=(4). Written and verbal instructions were given to ensure consistent viewing practices among clinical instructors. Specifically, they were asked to rate percent bone loss 2 mm apical from the CEJ to the root apex, and teeth with mesial and distal percent bone loss discrepancies were to be rated by the greater percentage of the two. For each question, clinical instructors were given at least thirty seconds to rate percent bone loss, record their response on the questionnaire, and transmit their response via wireless remote. The wireless remote was part of an audience response system (ARS) that allowed "real-time" display of responses during phase one of the training program. However, during the pretest and post-tests it was used for data collection only. Discrepancies between written and transmitted responses were omitted from the research database.
|
|
Data collected from the pretest and post-tests were analyzed for accuracy and consistency among clinical instructors. Sensitivity and specificity are usually used as indices of accuracy, yet they are not defined in situations with more than two categories. Therefore, Kappa coefficient described both agreements between the three occasions (pretest, post-test 1, and post-test 2) and accuracy defined as agreement with the correct choice. Accuracy was also measured by differences from the correct choice in two ways. One dependent variable was the difference between the clinical instructors ratings and the correct choice; this variable is indicated as "difference" in all tables. This difference is thus the signed rater error and reflects net deviation from the correct choice in one direction. A positive difference indicates an overestimation of bone loss, and a negative difference indicates underestimation of bone loss. The second dependent variable used in the final analysis was the absolute value of this difference. A zero indicates a correct choice, and a positive value reflects overall deviation from the correct choice in either direction. This variable is indicated as "absolute" in all tables. Both the arithmetic difference and absolute difference are necessary because there may be zero average difference while the absolute difference is non-zero, and if there is non-zero absolute difference, it is necessary to describe the direction of the difference. Disagreement was analyzed using repeated-measured, mixed-models analysis with the following independent variables in the ANOVA model: three clinical instructor groups, four correct choice categories, twenty-five radiographs, three occasions, and all possible two-way interactions of these effects. These analyses allowed for dependency of the ratings done by the same clinical instructor across both the multiple radiographs and the three occasions.
Accurate ratings are consistent since they all center on the correct choice. Where ratings are not accurate, they may be consistentcentering around an inaccurate value with little variabilityor they may be inconsistentvarying widely. Consistency is thus measured by the standard deviation (SD) of the ratings (square root of the squared difference between the ratings minus the mean of all the ratings provided). To look for differences in consistency, a mixed-model, heterogeneous-variance analysis tested for standard deviation differences between the three clinical instructor groups, the four correct choice categories, and the three occasions.
| Results |
|---|
|
|
|---|
|
|
Seventeen clinical instructors completed post-test 2. The instructors were three dental hygiene faculty members, five graduate students, and nine periodontal faculty members (Table 1
). There was no change in years of clinical experience for the instructors who completed the pretest as compared to those who completed post-test 2. Discrepancies were noted between written and transmitted responses for 1.3 percent of ratings; these ratings were omitted from the database. For teeth with no bone loss, 92.2 percent (94/102) of the clinical instructors ratings were accurate. Eighty-two percent, 77.4 percent, and 90.2 percent of the clinical instructors ratings were accurate for categories <15 percent, 1530 percent, and >30 percent bone loss, respectively. Overall, clinical instructors agreement with the correct choice was 85.2 percent. When corrected for chance agreement, this agreement was Kappa=80.3 percent (SE=2.3 percent).
Twenty-two clinical instructors provided ratings for both the pretest and post-test 1. The twenty-two clinical instructors consisted of four dental hygiene faculty members, eight graduate students, and ten periodontal faculty members. Their ratings were directly compared, and agreement was 67.3 percent (Kappa=56.5 percent, SE=2.7 percent) (Table 3
, upper panel). Seventeen clinical instructors provided ratings during both post-tests 1 and 2. The seventeen clinical instructors consisted of three dental hygiene faculty members, five graduate students, and nine periodontal faculty members. Their ratings were directly compared, and agreement was 76.7 percent (Kappa=68.9 percent, SE=2.8 percent) (Table 3
, middle panel). Seventeen clinical instructors provided ratings during the pretest and post-test 2. The seventeen clinical instructors were three dental hygiene faculty members, five graduate students, and nine periodontal faculty members. Their ratings were directly compared, and agreement was 67.8 percent (Kappa=57.1 percent, SE=3.1 percent) (Table 3
, bottom panel). As accuracy improved from pretest to post-test 1 (Kappa=52.7 percent to 68.7 percent), agreement between these two occasions was relatively low (67.3 percent). Subsequently, as accuracy improved slightly from post-test 1 to post-test 2 (Kappa=68.7 percent to 78.7 percent), agreement between these two occasions was higher (76.7 percent).
|
|
Using a mixed-model heterogeneous-variance analysis, it was determined that the variability of the difference (clinical instructors ratings minus the mean of the ratings provided) did not depend upon the three clinical instructor groups, but did depend upon the four correct choice categories and three occasions (LR chi square=215, df=11, p<0.0001). That is, there was more consistency (less variability) for correct choice categories none and >30 percent bone loss across time. However, this trend was not observed in the middle two categories (p<0.0001). That is, within both categories <15 percent and 1530 percent bone loss, consistency of clinical instructors responses remained unchanged across time (typical SD was approximately 0.40). Whereas within category >30 percent bone loss, consistency decreased from pretest (typical SD=0.16) to post-test 1 (typical SD=0.26) and then increased at post-test 2 (typical SD=0.18). The predominant increase in consistency was in correct choice category none, where the SD decreased from 0.34 at the pretest to 0.14 at both post-test occasions.
Overestimation of bone loss occurred during the pre-test more often than underestimation as indicated by positive mean differences for categories none, <15 percent, and 1530 percent bone loss (Table 4
, difference column). In the category <15 percent, 37.2 percent of clinical instructors ratings were given as 1530 percent bone loss, and only 4.2 percent were given as no bone loss. Similarly, in category 1530 percent, 34.2 percent of clinical instructors ratings were given as >30 percent bone loss, and only 17.3 percent were given as <15 percent bone loss. From the pretest to post-test 1, accuracy of ratings in categories <15 percent and 1530 percent increased, and overestimation of bone loss decreased by half. There was an increase in underestimation of bone loss (decrease in accuracy) in category >30 percent between the pretest and post-test 1, but by the second post-test, accuracy had returned to its original high level. The increase in accuracy from post-test 1 to post-test 2 is particularly evident in categories <15 percent and 1530 percent, where underestimation and overestimation of bone loss decreased, respectively.
| Discussion |
|---|
|
|
|---|
Our results show clinical instructors agreement with the correct choice overall improved with time. The greatest improvement was seen immediately after the first phase of the training program, yet accuracy continued to get better from post-test 1 to post-test 2. The mean difference and absolute difference improved in categories none, <15 percent, and 1530 percent bone loss, yet worsened in category >30 percent bone loss immediately after the first phase of the training program. In this category, the difference and absolute difference improved from post-test 1 to post-test 2. Additionally, consistency of clinical instructors responses initially decreased and then increased in category >30 percent bone loss. That is, the accuracy and consistency of ratings worsened immediately following phase one of the training program. Participation in this component of the training program may have been detrimental to clinical instructors ability to judge bone loss >30 percent. Improvement in accuracy and consistency among clinical instructors ratings was noted from post-test 1 to post-test 2 as agreement with the correct choice approached its initial high value. It may be that stressing the underestimation of bone loss in this category during the second phase of the program addressed any weakness of the training program. It is also possible that clinical instructors went back to judging severe bone loss in the manner they were accustomed to before participating in the program. Furthermore, it may be that the decrease in the initially high accuracy and consistency among clinical instructors was due to regression towards the mean.
The amount of error varied between the four bone loss categories. The greatest improvement of accuracy and consistency among clinical instructors ratings occurred in correct choice category none. Greater inaccuracies and inconsistencies are not unexpected in categories <15 percent and 1530 percent bone loss since errors can occur on both sides of these middle categories. Although, as suggested previously, it may be that bone loss of <15 percent and 1530 percent was more difficult to assess than none or >30 percent or teeth, and the actual amounts of bone loss selected for this study could have contributed to greater errors observed in these two categories.7
Previous work found periodontal faculty members to have significantly less error than dental hygiene faculty members in categories <15 and 1530 percent bone loss.7 There is some evidence that the amount of change in rater error, as a result of the training program, was not consistent for the three clinical instructor groups. Since the periodontal faculty began with nominally more accurate ratings, the amount of improvement possible was smaller than the other two groups. It is not unexpected that dental hygiene faculty members accuracy rates were initially lower than the other two groups since they are not diagnosticians nor do they routinely perform in-depth clinical assessments on a vast array of periodontal patients. In general, rater error could occur due to poor digitized radiographic image quality, use of a projector for displaying these magnified images, indistinguishable or difficulty in recognizing anatomical landmarks, or rating bone loss from a distance less than or greater than 2 mm apical from the CEJ as elaborated on earlier.7 Rater error could have persisted throughout the duration of this study for any of these reasons or may be a result of clinical instructors holding onto strongly held beliefs29 or a reflection of the training programs effectiveness and duration. Our results show an improvement in the difference and absolute difference between the three occasions. It may be that extending the program and concentrating on areas where errors persist could further improve accuracy and consistency of clinical instructors responses.
Overestimation of radiographic bone loss has been reported previously where the "gold standard" for which clinicians ratings were compared was direct surgical or Schei rule measurements.7,3032 Immediately after phase one of the training program, overestimation of percent bone loss decreased by half in categories <15 percent and 1530 percent bone loss, resulting in an improvement in accuracy. However, in category >30 percent there was an increase in underestimation of bone loss, resulting in a decrease in accuracy. At the second post-test, there was less underestimation of bone loss, and the accuracy and consistency of clinical instructors responses returned to their originally high values.
The percent of alveolar bone loss is an important component in establishing a diagnosis of periodontitis and managing the disease over time.10 Categories of bone loss used in this investigation (none, <15 percent, 1530 percent, and >30 percent) help establish diagnoses of gingivitis and mild, moderate, and severe periodontitis, respectively. These categories make clinicians aware of and sensitive to all diagnostic findings and treatment needs. For example, progression of bone loss from 15 to 30 percent carries with it the potential for more complex treatment and/or potential specialty referral in order to achieve therapeutic goals. Accurate and consistent radiographic interpretation coupled with clinical findings is essential for establishing initial periodontal diagnosis and long-term follow-up of a patient.33 In a dental school setting, where multiple instructors participate in the care of a single patient, inaccurate and inconsistent ratings of percent bone loss could be particularly problematic. That is, differences among clinical instructors could lead to a variety of periodontal diagnoses, prognoses, and treatment recommendations, which ultimately could result in over- or undertreatment. Inaccuracies and inconsistency among clinical instructors may also influence students abilities to correctly rate radiographic bone loss or relate these findings to clinical findings, which are needed to adequately diagnosis and manage periodontal patients. Furthermore, variations among clinical instructors could negatively influence assessment of student performance and teaching effectiveness. Clinical instructors in most educational programs are considered content experts and evaluate students based on their ability to generate an answer consistent with theirs. If the said experts opinions are different on different occasions, then the ability to reliably assess student performance and evaluate teaching programs is lost.
Clinicians ratings of radiographic bone loss should ideally be consistently accurate; however, this goal was not reached during the course of this study. This must be taken into consideration when teaching and assessing students abilities to judge percent bone loss. It may be that the best way to ensure that students and clinical instructors alike are consistently rating percent bone loss accurately is to use the Schei ruler to verify the actual amount of percent bone loss especially when it is thought to be between <15 percent and 30 percent or when the amount of bone loss is in question. The Schei ruler has been found to be accurate in determining bone loss as compared to surgical measurement, and it is efficient and easy to use.12 Additionally, computer-assisted radiography has been shown to improve the accuracy of detecting changes in alveolar bone.3436 This technology could be an asset in the teaching and learning of radiographic assessment in dental and dental hygiene education. Unfortunately, it is not available in all dental schools or clinical practices.
The training program was designed to be relatively brief and ongoing and to provide immediate feedback to clinical instructors on their assessment of radiographic bone loss. This was made possible by utilizing a single projection system, the ARS, for displaying responses in "real-time" during phase one of the training program and reviewing test radiographs immediately after each post-test. A second post-test was administered to determine clinical instructors recall of information some time after the initiation of the training program. The three-month interval was thought to be appropriate in testing clinical instructors recall of information, and it was most convenient based on their other professional obligations. The improvement in accuracy and consistency among clinical instructors seen immediately following phase one of the program was seen again after three months. Therefore, the skills that clinical instructors gained as part of the program were sustained over time. It is important to note that the first phase of the program may have contributed to clinical instructors inability to correctly judge percent bone loss of >30 percent since accuracy rates and consistency among clinical instructors worsened immediately afterwards. Possible reasons for this have been discussed earlier.
The number of clinical instructors participating in this study decreased over time, and nonparticipation could lead to sampling bias. The difference in pretest and post-test 1 response rates was influenced by the number of clinical instructors eligible to participate in each of these tests. Seven "new" clinical instructors (five graduate students and two periodontal faculty members) joined the department between the third occasion (post-test 2) and the first two occasions (the pretest and post-test 1). Under all testing conditions, clinical instructors viewed and rated digitized radiographic images and responded to test questions simultaneously yet independently, without consulting with one another. Since these "new" clinical instructors were tested under the conditions just described and had not previously viewed the radiographic images nor participated in the training program, their responses were incorporated into the pretest data set. Sessions were offered once, and scheduling conflicts could have prevented clinical instructors from participating in a session or in a sessions entirety given their other teaching, research, and clinical responsibilities. Differences in the number of clinical instructors participating in the training program are likely a result of scheduling conflicts. However, changes in response rates could be a reflection of clinical instructors beliefs that the program was redundant or not useful or that their individual accuracy was adequate and participation in the program was no longer needed. Providing an opportunity for clinical instructors to critique the training program may have provided insight into further reasons for nonparticipation.
This program has other limitations. Digitized radiographic images were scanned using a relatively low resolution and displayed by a fixed-pixel projector. These images compared to plain films likely differed in resolution, contrast, greyscale manipulation, and magnification. This could have affected image quality and thus impacted the results of this investigation since clinicians responses were compared to correct choice categories as determined by viewing plain films on a view box. However, it is important to note that two of the authors (SKL and HJT) independently confirmed correct choice categories using the LCD projector prior to the clinical instructors viewing of digitized radiographic images. Clinical instructors may have discussed radiographs and their ratings of percent bone loss with one another throughout the course of this investigation, which could have influenced the results. It could also be argued that multiple viewings of the same radiographs could have contributed to the increase in accuracy and consistency of clinical instructors responses reported here. However, radiographs were randomly viewed at each occasion, and three months separated post-test 1 and post-test 2, making it difficult for clinical instructors to base their ratings of bone loss on familiarity of the radiographs alone. It is more likely that the skills the clinical instructors gained as part of this program were applied during these post-tests.
It may be acceptable to have inconsistencies among clinical instructors when there are a number of subjective elements that go into making clinical decisions as long as the decisions are based on evidence or accepted practice guidelines. Making determinations of bone loss is based on relationships between anatomical landmarks, which can actually be measured. Therefore, determining percent bone loss is less subjective than interpretation of other clinical findings that can not be directly measured, and inconsistencies among clinical instructors are less expected and less acceptable. Further attempts at training and calibrating instructors are needed so that the accuracy and consistency of their ratings can be enhanced and teaching effectiveness and students abilities can be adequately evaluated. The training program resulted in a general improvement in accuracy for most categories; however, greater improvement in accuracy and consistency among clinical instructors may be possible with extension of the program to include more radiographs. An additional next step would be to determine if the gains in accuracy and consistency of clinical instructors assessments of percent bone loss could be "transferred" to plain films. Previous work showed rating percent bone loss by viewing projected digitized radiographic images was only slightly different in terms of accuracy and consistency as compared to viewing plain films via view box.7 Therefore, skills learned as part of this training program should be easily applied to plain film viewing.
| Conclusion |
|---|
|
|
|---|
| Acknowledgments |
|---|
| Footnotes |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. Murray Food For Thought: Self-Criticism and Raising the Bar of Dysphagia Practice Swallowing and Swallowing Disorders (Dysphagia) , June 1, 2009; 18(2): 68 - 77. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. V. Garland and K. J. Newell Dental Hygiene Faculty Calibration in the Evaluation of Calculus Detection J Dent Educ., March 1, 2009; 73(3): 383 - 389. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Mileman and W. van den Hout An evaluation by teachers of a decision aid for viewing bitewing radiographs Dentomaxillofac. Radiol., December 1, 2008; 37(8): 425 - 432. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |