Linguistic features of writing quality

Danielle S. McNamara; Scott A. Crossley; Philip M. McCarthy

doi:10.1177/0741088309351547

Linguistic features of writing quality

Danielle S. McNamara, Scott A. Crossley, Philip M. McCarthy

Research output: Contribution to journal › Article › peer-review

367 Scopus citations

Abstract

In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

Original language	English (US)
Pages (from-to)	57-86
Number of pages	30
Journal	Written Communication
Volume	27
Issue number	1
DOIs	https://doi.org/10.1177/0741088309351547
State	Published - Jan 2010
Externally published	Yes

Keywords

Assessment
Coherence
Cohesion
Computational linguistics
Essay quality
Writing proficiency

ASJC Scopus subject areas

Communication
Literature and Literary Theory

Access to Document

10.1177/0741088309351547

Cite this

@article{08b537bdf69c4c298987d97927c52c7e,

title = "Linguistic features of writing quality",

abstract = "In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.",

keywords = "Assessment, Coherence, Cohesion, Computational linguistics, Essay quality, Writing proficiency",

author = "McNamara, {Danielle S.} and Crossley, {Scott A.} and McCarthy, {Philip M.}",

year = "2010",

month = jan,

doi = "10.1177/0741088309351547",

language = "English (US)",

volume = "27",

pages = "57--86",

journal = "Written Communication",

issn = "0741-0883",

publisher = "SAGE Publications Inc.",

number = "1",

}

TY - JOUR

T1 - Linguistic features of writing quality

AU - McNamara, Danielle S.

AU - Crossley, Scott A.

AU - McCarthy, Philip M.

PY - 2010/1

Y1 - 2010/1

N2 - In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

AB - In this study, a corpus of expert-graded essays, based on a standardized scoring rubric, is computationally evaluated so as to distinguish the differences between those essays that were rated as high and those rated as low. The automated tool, Coh-Metrix, is used to examine the degree to which high- and low-proficiency essays can be predicted by linguistic indices of cohesion (i.e., coreference and connectives), syntactic complexity (e.g., number of words before the main verb, sentence structure overlap), the diversity of words used by the writer, and characteristics of words (e.g., frequency, concreteness, imagability). The three most predictive indices of essay quality in this study were syntactic complexity (as measured by number of words before the main verb), lexical diversity (as measured by the Measure of Textual Lexical Diversity), and word frequency (as measured by Celex, logarithm for all words). Using 26 validated indices of cohesion from Coh-Metrix, none showed differences between high- and low-proficiency essays and no indices of cohesion correlated with essay ratings. These results indicate that the textual features that characterize good student writing are not aligned with those features that facilitate reading comprehension. Rather, essays judged to be of higher quality were more likely to contain linguistic features associated with text difficulty and sophisticated language.

KW - Assessment

KW - Coherence

KW - Cohesion

KW - Computational linguistics

KW - Essay quality

KW - Writing proficiency

UR - http://www.scopus.com/inward/record.url?scp=73249128238&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73249128238&partnerID=8YFLogxK

U2 - 10.1177/0741088309351547

DO - 10.1177/0741088309351547

M3 - Article

AN - SCOPUS:73249128238

SN - 0741-0883

VL - 27

SP - 57

EP - 86

JO - Written Communication

JF - Written Communication

IS - 1

ER -

Linguistic features of writing quality

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this