|
|
||||||||
Educational Methodologies |
Key words: criteria, evaluation, instruction, calibration
Submitted for publication 06/04/07; accepted 10/10/07
| Abstract |
|---|
|
|
|---|
Our examination of the learning environment discovered only a few examples of excellent preparations/restorations available to the students and, if present, the models of good quality (for example, prepared teeth) were often three to four times larger than life-sized. Evaluation forms that did exist were generally used for summative grading purposes only. Very rarely were the forms used by either students or faculty in day-to-day activities. When item analyses were conducted on student examination products, it was found that the weakest performance on student projects corresponded with the least clearly described criteria on the evaluation form.
In 1982, Dr. Richard Mackenzie et al. observed that "obtaining agreement on observations is not a trivial matter in dental education."1 Their critical analysis of factors that contributed to faculty disagreements resulted in identification of sixteen issues that, in their opinion, needed to be addressed (Figure 1
). They further argued that the reliability of any checklist was not sufficient and that validity (correlation of an item with ultimate clinical success) must also be considered.
|
Haj-Ali and Feil described an attempt to improve faculty calibration using a three-point rating scale describing an amalgam preparation.3 Acknowledging Mackenzie et al.s concerns related to reliability, they undertook an extensive training program with selected faculty. Results showed that, with use of a standardized rating form and training program, faculty agreement rose and held for the ten-week study. Overall, there was a 10 percent improvement in agreements, with improvement seen in nine of the thirteen individual criteria.
In a study describing a nongraded clinical assessment program, Taleghani et al. attempted to address Mackenzie et al.s factors by requiring faculty to document, in writing, all student clinical performance that fell short of clinical acceptability.4 Using new forms and faculty training sessions, these authors concluded that verbal interactions between faculty and students and student satisfaction with the nongraded system were both viewed positively. Taleghani et al. acknowledged that "faculty calibration is better organized and sequenced," but did not actually document improvement in calibration of either group.
Recently, the Commission on Dental Accreditations standards for American dental schools5 suggested the clear need for evaluation forms. Standard 2–8 states that "the dental school must employ student evaluation methods that measure the defined competencies." Standard 2–23 requires "graduates must be competent in the use of critical thinking and problem solving related to the comprehensive care of patient." Finally, Standard 2–25 requires documentation of competency of graduates in fourteen (2–25a-n) specific clinical skill domains. The implications of these standards on dental educators are:
It is apparent that there must be corresponding actions taken to respond to these challenges. First, for each clinical procedure that requires learner competence, faculty must write or revise evaluation forms (Develop Evaluation Forms). Second, as educators, there must be opportunities for the faculty and learner to discriminate (recognize good and bad examples of a criterion) and apply the established evaluation criteria (Train Criteria). And third, criteria on each evaluation form must be applied to demonstrate the attainment of competency (Use Evaluation Forms)
In the education literature, a learning paradigm6,7 is described and validated stating that the ability to recognize the critical features of the end product is a subskill of a learners ability to produce that end product. Recognition is the ability to make the necessary discriminations to distinguish good outcomes from poor ones. If a problem/error is never recognized, product improvement only occurs by chance, not by problem solving. The implication for dental educators (and learners) is that recognition—the ability to distinguish good outcomes from poor ones—must be trained first. Feil and Reed have identified this sequence and described its critical importance in their work on knowledge of results in student motor skill acquisition.8
In a study examining the relationship between student recognition skills and resulting product performance, a correlation was demonstrated between recognition and production.9 It was found that over half of the variance in product scores was accounted for by student recognition score (self-evaluation). Only those students who improved their recognition skills had an improvement in their product quality. It was noted that only those students who improved their ability to accurately evaluate their preparations, no matter at what level of evaluation they started, were able to improve the quality of their preparations. If, as this research suggests, improved evaluation skill leads to improved performance, it is imperative to determine the conditions that will best enable students to develop recognition skill. The first condition may be the availability of valid criteria. However, valid criteria alone may not be sufficient. It may be that what is needed are valid criteria within a format (evaluation form) that can facilitate useful feedback and active participation of the learner in the learning environment.
The guidelines for writing criteria for evaluation forms presented here address the two domains of validity and reliability. A criterion is considered valid if it is a vital determinant of a successful outcome. In clinical dentistry, success is prevention of disease or restoration of health. A valid criterion, therefore, must measure what a practitioner actually does in patient care. Validity is determined through evidence obtained from laboratory and clinical research. No criterion can be considered for inclusion on an evaluation form used in product analysis without the establishment of its validity. Reliability refers to the correct and consistent application of the valid evaluation criteria. If an evaluation criterion is vague or imprecise or if it is awkwardly formatted, both inter- and intra-rater reliability suffer. The end result then is the worst possible outcome, a confused learner. Mackenzie et al.s list of sixteen factors serves as a firm foundation for issues to address.
| Criteria for Writing Effective Evaluation Forms |
|---|
|
|
|---|
|
2. Criteria are collectively valid. (Mackenzie et al. concerns addressed: faulty memory, incomplete coverage of dimensions, differences in background, and differences in mental processing.)
It is not sufficient to establish validity only for each criterion. Designers of evaluation forms must also ensure that, when taken as a whole, the set of criteria completely cover the essential features of the procedure. That is, if a practitioner performs to a clinically acceptable level on each of the listed criteria, then the product will be clinically acceptable. There can be no "other" unidentified item (i.e., not listed on the evaluation form) that can be included in the evaluation. By ensuring the completeness of the evaluation form description of the process, the learners know what they need to know.
3. Criteria are noncompensatory. (Mackenzie et al. concern addressed: degrees of leniency.)
If each criterion is valid and if the set of criteria is collectively valid, then it must follow that the criteria are noncompensatory. This means that a practitioner cannot do exceedingly well on one criterion to make up for a substandard performance on another. The necessary outcome of criteria being noncompensatory is that assigning point value to criteria is meaningless. Simple adding or averaging of points will necessarily hide poor performance on a specific criterion and thereby threaten the clinical outcome. It is important to remember that all evaluation forms result in categorical data, which means that arithmetic manipulations are inappropriate. Summative evaluation becomes not an adding of points, but rather a pattern matching exercise. For example, in Figure 3
, there are fifteen criteria to address that describe the process. The grading pattern for fourth-year students (Figure 4
) requires achievement of 80 percent of the criteria in the Excellent category and none in the Standard Not Met category in order to be assigned an "A." It follows, however, that another student who achieves 80 percent of the fifteen criteria in the Excellent category and one Standard Not Met does not pass. Note that the grading scale changes depending on the student year. It is critical that faculty realize that while the grading scale may change, the evaluation call itself must never change. If a performance on a criterion is deemed Standard Not Met, it must be called a Standard Not Met regardless of the students year. Changing the assessment of any criterion based on year group of the learner (an act of leniency) only confuses the learner because the standard changes. In other words, it is acceptable to change the grading scale but not the evaluation standard.
|
|
Sequencing the criteria on evaluation forms provides several key benefits. For the learner, sequenced criteria segment procedures into discrete parts that can be identified, practiced, and related to other components of the skill being learned. An additional benefit for the learner is that each and every time self-evaluation is performed, the sequence of performance is reinforced. For the faculty, too, there is a benefit as they are more likely to remember the sequence and thus recall the relevant criteria.
We would suggest that establishing the sequence of valid criteria is best accomplished by having an expert perform the procedure and describe out loud what is being done and why. The experts dialogue is recorded. Prompting questions from the recorder (such as "what do you do first? second? etc." and "how do you know when you have completed this step?") can be invaluable in writing an evaluation form.
Reliability: Establish Format
5. Criteria descriptions are aligned horizontally. (Mackenzie et al. concerns addressed: checkpoint ambiguity and unsystematic inspection.)
One issue in using evaluation forms is the ease of use for both educator and learner. To give meaningful formative assessment, it is useful for the faculty assessment to be assigned congruently to the learners assessment. It is simply quicker to do this in a horizontal format than in a vertical one. There is also less chance of error in underlining or checking the correct criterion statement. In some cases, especially for essentially yes or no criteria, there may not be a statement about a specific criterion in each degree of excellence. While there are no examples of this in Figure 3
, on an evaluation form used for root planing there is a criterion for calculus removal. The only two assessments are Calculus Removed (Excellent) and Calculus Remaining (Standard Not Met). Through horizontal alignment, less time is spent searching for the criterion in each degree of excellence.
Listing the criteria horizontally also leads the evaluator to consider all criteria for a given product/procedure. This is critical in ensuring that the learner receives complete feedback on the entire task. All too often, evaluators will only make a call for the poorest criterion of a given task, which fails to provide information on the rest of the criteria. Further, failure to make all the calls robs the course leader of valuable information that can be obtained when item analyses of performance examinations are performed to provide the data to direct and support course improvement.
6. Criteria are consistently numbered. (Mackenzie et al. concerns addressed: checkpoint ambiguity and unsystematic inspection.)
Numbering criteria consistently across all the degrees of excellence (Figure 3
: Excellent, Clinically Acceptable, and Standard Not Met) provides the learner and the educator with much needed information in formative assessments to detect specific learner problems and to suggest remediation strategies. The learner can, and should, be encouraged to independently chart performance on each criterion over time. In so doing, the learner can identify specific areas for concentrated practice rather than just doing the procedure all over again.
The faculty can also use the student-collected data to design specific instructional tasks for the individual learner to address any identified deficiency. Having clear information allows the faculty to design purposely focused remediation programs. Again, the course leader can also use the item analysis data to determine where instructional methods are failing to achieve the desired learner outcomes. Once identified, the course leader can identify whether the criterion was taught, where the criterion was taught, and how it was taught. The leader can then implement databased improvement strategies for the next iteration of the course.
7. Format facilitates process. (Mackenzie et al. concerns addressed: checkpoint ambiguity, faulty memory, incomplete coverage of dimensions, and unsystematic inspection.)
Simply listing the criteria is not sufficient for either the learner or the faculty. As Mackenzie et al. point out, nothing should be left to chance: the more information given to the evaluator (whether a learner or a faculty member), the more likely the evaluator will perform in an acceptable fashion. The evaluation form should have columns for the steps, tests, criteria, and problem solving. By listing steps, the evaluator is led systematically through each criterion to ensure the entire process is considered. Having the tests listed also ensures that the evaluator is applying the criteria correctly. Providing a column or designated space for student-generated written statements is especially valuable. Requiring the learner to commit to writing what the observed problem is and to speculate on how to correct it provides the faculty with solid data on what the learner believes and understands. How many times have we identified a problem in a learners product and have him or her agree with us, when we suspect the learner either didnt recognize the problem, couldnt recognize it, or wouldnt recognize it? By having the learner identify the problem and write the solution, the faculty member can quickly assess whether this is a problem in recognition of a problem, a misunderstanding of information, or perhaps a lack of critical relevant information. With this information in hand, a faculty member can either directly correct the deficiency immediately or design a piece of instruction for the learner that addresses the identified deficiency.
8. Levels of acceptable and unsatisfactory performance are visually distinguishable. (Mackenzie et al. concerns addressed: unstandardized aids to judgment and degrees of leniency.)
While perhaps self-evident, this point is important to the learner in developing problem-solving skills. With a clear demarcation, the learner can efficiently identify the severity of clinically significant problems at a glance. Problem-solving skill is enhanced on two fronts. First, the learner can develop corrective strategies that address the specific deficiency. Equally beneficial, the learner can then assess the effect(s) of the corrective action on other criteria before initiating the correction. Learning is enhanced as the learner develops an understanding of the interrelationships between the criteria, i.e., how modification to improve one criterion may lead to a decline in quality in another. For example, in attempting to smooth a wall of a cavity preparation, the outline form extension may be affected.
9. Evaluation form is labeled appropriately. (Mackenzie et al. concerns addressed: checkpoint ambiguity and faulty memory.)
The title of the evaluation form quickly alerts the evaluator to the task at hand, thereby eliminating confusion and directing focus. The importance of patient and provider identifiers (preferably chart numbers and identification numbers) is obvious. Date-stamping evaluation forms allows tracking progress on a specific task over time for both the learner and the faculty member. Labeling also provides evidence for breadth of experience, an important consideration in assessing competency. Finally, attending to these features will contribute to data collection for clinical research.
It is also important to give instructions for the student and the faculty on how the evaluation form is to be used. Directions for completing the form should be clearly delineated for both the learner and the faculty to facilitate data collection related to common errors and criterion ambiguity. (See Figure 3
, "Instructions.")
10. Format is consistent with evaluation forms for other products and procedures. (Mackenzie et al. concerns addressed: checkpoint ambiguity, unsystematic inspection, differences in background, and differences in mental processing.)
With the emphasis on comprehensive care in contemporary dental education, consistency is a vital consideration to address for both the faculty and the learner. The learner needs to grasp the salient features of each procedure to be mastered. By having a uniform format for all evaluation forms, the learner only needs to learn the format once. The learner then can focus and concentrate on the criteria for each procedure, a much more efficient learning strategy. Similarly, faculty teaching in a multidisciplinary environment do not need to struggle with learning various departmental evaluation form formats, focusing instead on understanding and applying the criteria on the form.
Reliability: Establish Clarity
11. Number of degrees of excellence promotes high reliability. (Mackenzie et al. concern addressed: degrees of leniency.)
This issue may appear to be a difficult one to resolve. At first consideration, one might be convinced that the fewer criterion categories there are, the higher the reliability will be. We would suggest that, for competency-based education, serious consideration should be given to having two categories: Clinically Acceptable and Standard Not Met. One can make a cogent argument for defining the line between clinically acceptable and unacceptable as the critical discrimination to be made in patient care. It can be argued, we believe, that having only two categories is all that is necessary for licensure examination determinations. In academia, many will suggest that because most institutions require discrete grades (as opposed to Pass/Fail systems) we need to separate Excellence from Clinically Acceptable. We have also heard this argument extend to suggesting that, without defining Excellent, students will not be motivated to achieve. A third consideration is that, without an Excellent category, the learner may be denied useful feedback on performance. Each of these considerations has merit and needs to be decided locally.
There are those who might also argue that there should be two categories for Standard Not Met: those situations that are Unacceptable But Correctable and Unacceptable and Not Correctable. We would suggest that in a competency-based system this distinction not be made. The rationale is that a correctable error that is not corrected is still an error, threatens the success of the procedure, and is therefore a Standard Not Met.
The key to determining the number of degrees of excellence is the ability to clearly define the parameters for each category. If three or four (or more) categories can be defined so well that reliability can be demonstrated, then use them. Reliability of assessment for both the faculty and the student is the desired outcome. However, we urge that writers do not try to force definitions that do not exist. For instance, if three categories can be well defined for most of the criteria, then use three, recognizing that one or two of the criteria within the form might only lend themselves to two categories. In these situations, we highly recommend that meeting the criterion be listed in the Excellent category and leave the Clinically Acceptable category blank (see example in number 5 above).
12. Degrees of excellence are operationalized. (Mackenzie et al. concerns addressed: faulty memory, untrained estimation of size, incomplete operational definition, inadequacy of verbal definitions, and differences in mental processing.)
It has been said that "always" and "never" are words we should always remember never to use. The same admonition can be given for "slightly," "moderately," and "severely." In writing criteria, the authors should operationalize each criterion, i.e., provide measurement ranges, positional relationships, texture statements, etc. (Figure 3
, criterion 1). It is also most beneficial when training criteria have actual size (or video clip) examples of the target (the ideal), as well as examples of errors for discussion and discrimination. An excellent source of errors is student performance examination products from previous years. Using these past examination products provides value in that student products reveal almost every error in every degree of excellence, and the faculty evaluations are the answer keys.
13. Terminology is consistent. (Mackenzie et al. concerns addressed: faulty memory, inadequacy of verbal definitions, and definition ambiguities.)
It is imperative that evaluation forms use consistent terminology. Learners are not only learning psychomotor skills, but they are also learning a professions language. Using recognized terminology consistently reinforces the learners practice with the new language of the profession, especially if the evaluation process includes dialogue. For example, in describing the preparation of a tooth for restoration in amalgam, we would suggest consistent use of "extension" and "over/underextension" rather than "extension" followed by "wide" and "narrow." Figure 3
, criterion 4 addresses this very issue in an evaluation form used in oral medicine.
14. Tests are described specifically. (Mackenzie et al. concerns addressed: untrained estimation of size, unstandardized aids to judgment, unspecified methods of observing, discrepancies in visual acuity, inadequacy of verbal definitions, inadequate communication with nonverbal examples, and differences in background.)
A key to reliability in evaluation is to know exactly how to apply each criterion. Specific instructions, included on the evaluation form, need to be given to the evaluator. Issues such as instruments to use, reference point, and/or method of observation need to be addressed. These instructions, useful for the faculty and the learner, can be powerful tools for identifying key learning issues to be addressed. Figure 3
, criterion 10 is an example.
15. The set of criteria are broad enough to cover an entire range of tasks and clinical conditions. (Mackenzie et al. concerns addressed: incomplete coverage of dimensions, unspecified exceptions, incomplete operational definition, differences in background, and differences in mental processing.)
To facilitate learning, it is helpful for the learner to understand the complete set of criteria for a clinical task. For example, in the concepts of the amalgam preparation, it is most likely easier for the learner to see all of the concepts and critical features on one evaluation form rather than having separate forms for Class I, II, V, etc. Having multiple evaluation forms that are narrowly focused can tend to parse a procedure to such an extent that the learner loses track of the interrelationships among the individual criteria. Having multiple evaluation forms also creates the potential for using an inappropriate form.
Having the evaluation form cover the broad range of clinical conditions demands that educators look carefully to ensure that criteria selected are useful in promoting the application of clinical skills rather than producing widgets. For instance, are we interested in "ensuring instrument control for effectiveness and safety" or are we interested in "creating a fourth finger rest"?
| Summary |
|---|
|
|
|---|
We believe that, once generated, criteria forms should be the basis of curriculum and course design. If evaluation forms are designed as suggested here, they will embody the learning objectives, the sequence of presentation, and the design of learning exercises. Effective evaluation forms will provide feedback to the learner, the supervising faculty, the course designer, and the curriculum manager as well. We would also suggest that design of evaluation forms may prompt clinical and basic science research problems to address and solve.
| Footnotes |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. A. Navickis, K. K. Bray, P. R. Overman, M. Emmons, R. F. Hessel, and S. E. Cowman Examining Clinical Assessment Practices in U.S. Dental Hygiene Programs J Dent Educ., March 1, 2010; 74(3): 297 - 310. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Hauser and D. M. Bowen Primer on Preclinical Instruction and Evaluation J Dent Educ., March 1, 2009; 73(3): 390 - 398. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Kramer, J. E.N. Albino, S. C. Andrieu, W. D. Hendricson, L. Henson, B. D. Horn, L. M. Neumann, and S. K. Young Dental Student Assessment Toolbox J Dent Educ., January 1, 2009; 73(1): 12 - 35. [Full Text] [PDF] |
||||
![]() |
J. E.N. Albino, S. K. Young, L. M. Neumann, G. A. Kramer, S. C. Andrieu, L. Henson, B. Horn, and W. D. Hendricson Assessing Dental Students' Competence: Best Practice Recommendations in the Performance Assessment Literature and Investigation of Current Practices in Predoctoral Dental Education J Dent Educ., December 1, 2008; 72(12): 1405 - 1435. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |