Coh-Metrix: Capturing Linguistic Features of Cohesion

Danielle S. McNamara, Max M. Louwerse, Philip M. McCarthy, Arthur C. Graesser

Research output: Contribution to journalArticle

136 Scopus citations

Abstract

This study addresses the need in discourse psychology for computational techniques that analyze text on multiple levels of cohesion and text difficulty. Discourse psychologists often investigate phenomena related to discourse processing using lengthy texts containing multiple paragraphs, as opposed to single word and sentence stimuli. Characterizing such texts in terms of cohesion and coherence is challenging. Some computational tools are available, but they are either fragmented over different databases or they assess single, specific features of text. Coh-Metrix is a computational linguistic tool that measures text cohesion and text difficulty on a range of word, sentence, paragraph, and discourse dimensions. This study investigated the validity of Coh-Metrix as a measure of cohesion in text using stimuli from published discourse psychology studies as a benchmark. Results showed that Coh-Metrix indexes of cohesion (individually and combined) significantly distinguished the high- versus low-cohesion versions of these texts. The results also showed that commonly used readability indexes (e.g., Flesch-Kincaid) inappropriately distinguished between low- and high-cohesion texts. These results provide a validation of Coh-Metrix, thereby paving the way for its use by researchers in cognitive science, discourse processes, and education, as well as for textbook writers, professionals in instructional design, and instructors.

Original languageEnglish (US)
Pages (from-to)292-330
Number of pages39
JournalDiscourse Processes
Volume47
Issue number4
DOIs
StatePublished - May 1 2010

    Fingerprint

ASJC Scopus subject areas

  • Communication
  • Language and Linguistics
  • Linguistics and Language

Cite this