|
|
||||||||
Educational Methodologies |
Key words: dental curriculum, computer-assisted instruction, communication, educational measurement
Submitted for publication 06/28/04; accepted 11/02/04
| Abstract |
|---|
|
|
|---|
| The Ability to Write Effectively |
|---|
|
|
|---|
There have been few published studies of the quality of dentist-authored written communication; however, those that have appeared suggest that significant deficiencies exist in the quality of these materials.8,9 As recently noted by Alexander, appropriate written communications can be an important supplement to verbal instructions and can improve both patient satisfaction and compliance.10 It is Alexanders observation that dentistry has lagged behind the other health professions in the systematic evaluation of written material provided to patients.
Yet, if such written materials are lacking in their design, it must also be recognized that dental education is but part of a complex system in which there is no guarantee that entering students possess the advanced writing skills necessary to communicate with a diverse patient population. Studies of the writing performance of high school students and college freshmen continue to suggest that the achievement of writing ability is a complex, difficult goal of education at all levels. A recent report on Californias colleges and universities found that less than 50 percent of freshmen could produce technically adequate papers and did not demonstrate the ability to analyze arguments or synthesize information.11 The commission stated, "What most students cannot do is write well. At least, they cannot write well enough to meet the demands they face in higher education and the emerging work environment. Basic writing itself is not the issue; the problem is that most students cannot write with the skill expected of them today." Further, citing the National Assessment of Education Progress, the commission report states that "most students have mastered writing basics, but few are able to create precise, engaging, coherent prose." To paraphrase a recent article in the Chronicle of Higher Education, Johnny may not be able to write, even if he went to Princeton.12
Nevertheless, dental professionals need advanced writing skills to function as competent health care providers. In the early 1980s, Bjork13 identified the need to incorporate training in scientific writing into the preparation of dentists and described the philosophy, structure, and content of a professional writing course for dentists; such courses do not appear to have become an integral part of the undergraduate dental curriculum. The reasons are evident: assessing writing, as well as teaching it, is complex, difficult work. The National Commission on Writing in Americas School and Colleges acknowledges such difficulties and suggests increased reliance on information technology to enhance students skills in written communication. The commission recommended that private and public leaders "work with educators to apply new technologies to the teaching, development, grading, and assessment of writing" and that "the nation should invest in research that explores the potential of new and emerging technologies to identify mistakes in grammar, encourage students to share their work, help assess writing samples, and incorporate software into measuring student-writing competence." If even the National Commission suggests that the time necessary to assess, correct, and give feedback to students posed significant barriers to the implementation of writing programs, then what is the profession of dentistry to do in order to assess and increase the writing abilities of its students?
| The Present Study |
|---|
|
|
|---|
| Materials and Methods |
|---|
|
|
|---|
Beginning with the freshman class in the fall of 2001 and continuing with the freshman classes of 2002 and 2003, all written assignments in this course were required to be submitted electronically via the WebCTTM distance-learning platform.18 Students downloaded the essay question and instructions to a computer at the location of their choice (school, home, or other location), completed the required essay, and uploaded it to the courses WebCT site. Submissions were required to be in HTML, enriched text, or Microsoft Word format; students were strongly encouraged to submit their essays as Word documents. Two weeks were provided for the students to complete the assignment. Since students were instructed that their essay should be an example of their best work, they were permitted to retract, replace, and resubmit their essays without penalty until midnight on the due date. Very few chose to replace their initial submissions. In cases where multiple submissions were made, only the final submission was included in the analysis.
Following the close of the submission period each year, a duplicate set of essays was created and the original set graded. Grades and comments were transmitted electronically to students. Essays submitted in text or HTML formats were converted to Microsoft Word documents. The duplicate set of essays was "cleaned" by a research assistant who removed all identifiers as well as page numbers, headers, footers, headings, subheading, and text enhancements. The outcome of this "cleaning" process was a uniform set of Microsoft Word documents, each consisting of text without any enhancements.
Using procedures available in Microsoft Word 2000, the following statistics were generated and recorded for each essay to evaluate surface features of the samples: counts including words, characters, paragraphs, and sentences; means (including sentences per paragraph); words per sentence; and characters per word.19 In addition, the following readability measures were generated and recorded to assess syntactic complexity: the percentage of passive sentences; readability as measured by the Flesch Reading Ease score; and language sophistication as measured by the Flesch-Kincaid Grade Level (both measures are described in the following paragraph). Although each essay was also checked for spelling and grammar, errors were not recorded because the technical quality of students submissions was beyond the scope of this project, which focused on testing the feasibility of conducting a computer-based assessment of students writing skills.
The Flesch Reading Ease score and the Flesch-Kincaid Grade Level scale are both widely used, reliable, and valid indicators of readability.20,21 The two instruments measure somewhat different aspects of readability using calculations based upon sentence length, number of sentences in a block of text, average number of syllables per word, and number of words per sentence. The "Help" files built into Microsoft Word 2000 provide the formulas used to calculate these scores.
Using a 100-point scale, the Flesch Reading Ease score rates text for ease of understanding. A score of 100 indicates text that is very easy to understand and 1 indicates that the text is very difficult to understand. The Flesch-Kincaid Grade Level score indicates the U.S. grade school level at which the text is written. Thus, a score of "6" would indicate that a sixth grader could read and understand the document.
To illustrate the meaning of and relationship between the Flesch Reading Ease score and the Flesch-Kincaid Grade Level, it is useful to consider how representative passages are scored. Applying these measures to the first paragraph of this article, for example, a Flesch Reading Ease score of 28.3 is derived. This indicates that the paragraph is difficult or complex to read and understand. The Flesch-Kincaid Grade Level instrument indicates that the same paragraph is written at a twelfth-grade level. This means that a twelfth grader should be able to read and understand the text. However, it should be noted that this instrument cannot indicate that text is written at higher than a twelfth-grade level and, thus, may actually underestimate the degree of reading skill necessary to read and interpret text. Thus, the first paragraph of this article requires a high level of skill to read and interpret, which may be interpreted as an indicator of the level of sophistication at which the authors write. To illustrate the other extreme, the authors developed the following passage modeled after a reading primer.
See Dick and Spot. See Spot run. Run Spot run.See Sally and Puff. See Puff run. Run Puff run.
Sally plays with Puff. Dick watches Sally and Puff.
Sally and Dick grow up to be dentists.
Applying the Flesch Reading Ease instrument to this sample passage, a score of 100 is derived indicating that this passage is extremely easy to read. The Flesch-Kincaid Grade Level instrument rates this passage as 0.0 meaning that a beginning reader could read and interpret the text.
Applying the Flesch Reading Ease instrument and the Flesch-Kincaid Grade Level instrument to a sample of student writing yields valuable information. Kaplan et al. have demonstrated, for example, that essay length has a relatively large impact on the prediction of human ratings. Their study found that off-the-shelf computer-based writing assessment programs that use word counts predicted human ratings of essays exactly 60 percent of the time and within one score point 90 percent of the time. The researchers therefore concluded that essay length "continues to measure something appearing to be a proxy for component(s) of writing skills not directly measured by the grammar checkers."23 Surface features such as word count are thus not isolated features of writing and should not be dismissed as a ciphering of mere fluency and readability.24,25
| Results |
|---|
|
|
|---|
|
Summary measures such as average number of sentences, average sentences per paragraph, and varied widely. The percentage of sentences written in passive voice showed particularly great variability, with percentages of passive sentences ranging from zero to 73 percent. Several measures showed significant differences among classes. Significant differences by class were found for total number of words per essay (image words), total number of characters per essay (image characters), total number of paragraphs per essay (image paragraphs), total number of sentences per essay (image sentences), average number of words per sentence (image average words sentence), and the percentage of passive sentences per essay (image percent passive). Table 2
provides details of counts, summary measures, and the results of a one-way analysis of variance comparing the essays produced by the three consecutive first-year classes studied.
|
Readability measures were calculated and examined for the total population. In addition, readability measures were calculated and compared by class and by gender. The Flesch Reading Ease score for the entire population ranged from 14.6 to 74.0 with a mean of 50.561, indicating that the essays were moderately difficult to read. The essays produced by the freshman classes of 2002 and 2003 were somewhat more difficult to read than those produced by the class entering in 2001. This is clearly shown in the box plots comparing the performance of the three classes on this measure presented in Figure 1
. These plots allow the comparison of medians and quartiles among the three groups. The box plots show that the median score for the entering class of 2001 was somewhat higher than the other groups, indicating that they produced essays that were somewhat easier (simpler and less complex) to read than those produced by the other two classes. In addition, an examination of the quartiles shows that there was greater variation among the members of the entering class of 2001 than among the other classes. A one-way analysis of variance was used to compare the Reading Ease scores among classes and confirmed that the observed differences were significant. Table 3
provides the results of this analysis.
|
|
|
The Flesch Reading Ease score and the Flesch-Kincaid Grade Level measure different but related aspects of readability. To examine the relationship between these two measures, the bivariate correlation (Pearsons correlation) was calculated based on the results obtained from the overall population. The results obtained using these measures indicated a high negative correlation (.834 with two tailed significance of p<.000). A negative correlation should be expected since polar scores indicate opposite levels of difficulty or grade level for the two instruments. Recall that a high Flesch-Kincaid Grade Level score indicates a higher level of reading skill necessary to understand a piece of written work, while a high score on the Flesch Reading Ease measure indicates that the written material is extremely easy to read. Thus, the results of these two measures should be negatively correlated as was found in this study.
| Discussion |
|---|
|
|
|---|
In this study, we tested the feasibility of evaluating the readability of student essays using two easily accessible instruments. Written assignments generated by three successive entering classes of dental students were evaluated using the Flesch Reading Ease score and the Flesch-Kincaid Grade Level indicator. These measures are included in virtually all versions of Microsoft Word, the de facto standard word processing program in most university settings. Though not the main purpose of the study, the data collected provide a useful baseline for assessing the performance of subsequent entering dental students as well as the future performance of these students as they move through dental school.
Assessment of the readability level of documents produced by students in their regular coursework was an efficient and effective mechanism for assessing the sophistication, though not the technical quality, of students written work. When coupled with the required electronic submission of written materials, the analysis of Reading Ease scores and Grade Level was straightforward, requiring only the grammar and spelling checker provided in Microsoft Word to calculate these measures and provide document statistics. These data can then be easily extracted and analyzed.
The automated identification of spelling and grammar errors provides the potential for supplying additional feedback to students regarding their performance. In addition, it can provide inputs for developing writing skills programs designed to address problem areas in written communication. Further, individualized data allows for the development of customized remediation programs for students with serious deficiencies in written communication skills.
These data suggest that entering dental students have a broad range of writing skills. Although the entrance standards on measures such as the Dental Achievement Test suggest great homogeneity among individuals applying to dental school on a number of variables, the scores from this study suggest considerable heterogeneity in students writing skills. Overall, the grade equivalent writing level of all three classes was below the eleventh grade comprehension level, with some students producing written material at a level of sophistication generally expected from middle school children. Although students were instructed to demonstrate their best work, Reading Ease scores suggested only a moderate level of sophistication in the work they produced.
The instruments used were sensitive to performance differences among classes when coupled with a basic analysis using box plots and one-way analysis of variance. Issues of sample size and privacy prevented the analysis of performance based on race, ethnicity, country of origin, or primary language of students. However, since the New Jersey Dental School is among the most culturally diverse in the nation, it is likely that these variables would have a measurable influence on performance on written tasks and should be assessed.
The Flesch Reading Ease score and the Flesch-Kincaid Grade Level indicator are readily available and easy to use since they are incorporated into widely used word processing software. However, numerous alternatives exist. Among the most widely used readability tests are the Gunning "Fog" Readability Test,26 the Fry Readability Graph,27 and the McLaughlin "Smog" formula.28 A limitation of several of the most widely used instruments, including the Flesch-Kincaid Grade Level instrument, is their inability to adequately deal with text written at higher than a twelfth-grade level. Further, formula-based readability measures only tell part of the story. As pointed out by Alice Horning,29 prepositional analysis, discourse analysis, and cohesion analysis using appropriate instruments are necessary to get a true indicator of readability. However, these analyses are more complex, not as readily accessible, and require a higher level of sophistication to interpret.
| Conclusions |
|---|
|
|
|---|
In summary, the existing and widely available natural language assessment technology included in Microsoft Word shows promise as a vehicle for enhancing the assessment of dental students written communication skills. The ease of use and minimal training necessary to apply this technology can help mitigate the time-intensive nature of formal writing assessment.
As our study demonstrates, the combined use of WebCT technology (a system for electronically storing student work) and the Microsoft Word Flesch instruments (a system for electronically evaluating student work) meets the four criteria set by Kaplan et al. to ensure that the benefits of automated scoring exceed the costs.23 First, the information gained was defensible. The empirically established relationship between essay length and essay worth lends value to the information obtained in this study. Indeed, the great diversity in word length among the essayseven though a length was specified in the assignmentssuggests that the homogeneity established in a set of DAT scores may not reflect the heterogeneity of that population when writing ability is considered. Such information is invaluable to administrators who must set curricular goals in the notoriously overcrowded first two years of dental school. In that writing has long been established as a way to enable students to better understand their disciplines, the importance of helping students gain stronger writing skills early in their dental careers cannot be emphasized enough. As Sternglass reported based upon a longitudinal study of college level writing, "As the writing and conceptual development of the students in the study reveal, over several semesters and even several years, through consistent instructional prodding both in writing and discipline area courses, students can develop an analytic stance that permits them to understand the significance of ideas in the particular field to the level where they become able to question some of the assumptions in that field."30 It would serve the dental profession well to consider these findings during the next review of the Commission on Dental Accreditations curriculum requirements. We advocate the inclusion of the teaching of writing in the curriculum guidelines.
Second, the information collected was accurate, and third, the evaluation method is not coachable. Machine evaluation of surface features (those aspects of writing such as word count that require no inference) is consistent and reliable. While it could be imagined that a student would simply paste into WebCT multiple copies of the material to increase word count, such a technique would easily be identified by the human reader assigning an individual grade and comment to the reader. As Powers et al. reported, even deliberate attempts by sophisticated writers to stump automated systems are inherently difficult.31 In their study of attempts to stump E-Rater, the one reader who successfully stumped the program was a professor of computational linguistics who wrote several paragraphs and repeated them thirty-seven times. However, as the researchers concluded, such a technique would easily be identified and discounted by a human reader.
Fourth, the evaluation method is cost effective. Although some time was required to transfer the student writing samples from WebCT to Microsoft Word, this opportunity cost was not prohibitive to the present study.
There is, of course, much more work to be undertaken if Kaplans four criteria are to be fully met. The information is defensible, but the face validity of machine scoring for surface features, however highly correlated that information may be with measures designed to capture total ability, leaves much to be desired. While the information gained in this study was accurate, further study presently being undertaken by the authors is needed to assess the writing ability of dental students on variables not included in the present study. A students ability to think critically regarding the rhetorical specifications of an assignmentthe way the student addresses the purpose, audience, and subject of an assignmentcan best be assessed through multiple observations of student writing contained in portfolios.32 Hence, in increasing our accuracy with further study, we can increase the defensibility of this methodology. In addition, in that the prohibitive, time-consuming cost of portfolio scoring is well known, the relationship between portfolio scores and the Flesh scores may identify ways to use cost-efficient, computer-based scores of surface features in conjunction with human-based portfolio scores of rhetorical ability. If such assessment information is then used to enhance instruction, the distance between assessment and instruction may be more readily bridged through an increase in the use of technology.
| Footnotes |
|---|
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |