Linguistic features of writing quality

Danielle McNamara, Scott A. Crossley, Philip M. McCarthy

Research output: Contribution to journalArticle

177 Citations (Scopus)

Abstract

In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

Original languageEnglish (US)
Pages (from-to)57-86
Number of pages30
JournalWritten Communication
Volume27
Issue number1
DOIs
StatePublished - Jan 2010
Externally publishedYes

Fingerprint

Syntactics
Linguistics
linguistics
Students
Linguistic Features
comprehension
rating
writer
expert
Proficiency
language
Cohesion
student

Keywords

  • Assessment
  • Coherence
  • Cohesion
  • Computational linguistics
  • Essay quality
  • Writing proficiency

ASJC Scopus subject areas

  • Communication
  • Literature and Literary Theory

Cite this

Linguistic features of writing quality. / McNamara, Danielle; Crossley, Scott A.; McCarthy, Philip M.

In: Written Communication, Vol. 27, No. 1, 01.2010, p. 57-86.

Research output: Contribution to journalArticle

McNamara, Danielle ; Crossley, Scott A. ; McCarthy, Philip M. / Linguistic features of writing quality. In: Written Communication. 2010 ; Vol. 27, No. 1. pp. 57-86.
@article{08b537bdf69c4c298987d97927c52c7e,
title = "Linguistic features of writing quality",
abstract = "In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.",
keywords = "Assessment, Coherence, Cohesion, Computational linguistics, Essay quality, Writing proficiency",
author = "Danielle McNamara and Crossley, {Scott A.} and McCarthy, {Philip M.}",
year = "2010",
month = "1",
doi = "10.1177/0741088309351547",
language = "English (US)",
volume = "27",
pages = "57--86",
journal = "Written Communication",
issn = "0741-0883",
publisher = "SAGE Publications Inc.",
number = "1",

}

TY - JOUR

T1 - Linguistic features of writing quality

AU - McNamara, Danielle

AU - Crossley, Scott A.

AU - McCarthy, Philip M.

PY - 2010/1

Y1 - 2010/1

N2 - In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

AB - In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

KW - Assessment

KW - Coherence

KW - Cohesion

KW - Computational linguistics

KW - Essay quality

KW - Writing proficiency

UR - http://www.scopus.com/inward/record.url?scp=73249128238&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73249128238&partnerID=8YFLogxK

U2 - 10.1177/0741088309351547

DO - 10.1177/0741088309351547

M3 - Article

AN - SCOPUS:73249128238

VL - 27

SP - 57

EP - 86

JO - Written Communication

JF - Written Communication

SN - 0741-0883

IS - 1

ER -