Linguistic features of writing quality

Danielle S. McNamara, Scott A. Crossley, Philip M. McCarthy

Research output: Contribution to journalArticle

188 Scopus citations

Abstract

In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

Original languageEnglish (US)
Pages (from-to)57-86
Number of pages30
JournalWritten Communication
Volume27
Issue number1
DOIs
StatePublished - Jan 1 2010
Externally publishedYes

    Fingerprint

Keywords

  • Assessment
  • Coherence
  • Cohesion
  • Computational linguistics
  • Essay quality
  • Writing proficiency

ASJC Scopus subject areas

  • Communication
  • Literature and Literary Theory

Cite this