Coh-Metrix: Capturing Linguistic Features of Cohesion

Danielle McNamara, Max M. Louwerse, Philip M. McCarthy, Arthur C. Graesser

Research output: Contribution to journalArticle

131 Citations (Scopus)

Abstract

This study addresses the need in discourse psychology for computational techniques that analyze text on multiple levels of cohesion and text difficulty. Discourse psychologists often investigate phenomena related to discourse processing using lengthy texts containing multiple paragraphs, as opposed to single word and sentence stimuli. Characterizing such texts in terms of cohesion and coherence is challenging. Some computational tools are available, but they are either fragmented over different databases or they assess single, specific features of text. Coh-Metrix is a computational linguistic tool that measures text cohesion and text difficulty on a range of word, sentence, paragraph, and discourse dimensions. This study investigated the validity of Coh-Metrix as a measure of cohesion in text using stimuli from published discourse psychology studies as a benchmark. Results showed that Coh-Metrix indexes of cohesion (individually and combined) significantly distinguished the high- versus low-cohesion versions of these texts. The results also showed that commonly used readability indexes (e.g., Flesch-Kincaid) inappropriately distinguished between low- and high-cohesion texts. These results provide a validation of Coh-Metrix, thereby paving the way for its use by researchers in cognitive science, discourse processes, and education, as well as for textbook writers, professionals in instructional design, and instructors.

Original languageEnglish (US)
Pages (from-to)292-330
Number of pages39
JournalDiscourse Processes
Volume47
Issue number4
DOIs
StatePublished - May 2010
Externally publishedYes

Fingerprint

Linguistics
Computational linguistics
linguistics
Textbooks
Education
discourse
Processing
stimulus
psychology studies
Linguistic Features
Cohesion
computational linguistics
psychologist
textbook
instructor
psychology
writer
Discourse
science

ASJC Scopus subject areas

  • Linguistics and Language
  • Communication
  • Language and Linguistics

Cite this

Coh-Metrix : Capturing Linguistic Features of Cohesion. / McNamara, Danielle; Louwerse, Max M.; McCarthy, Philip M.; Graesser, Arthur C.

In: Discourse Processes, Vol. 47, No. 4, 05.2010, p. 292-330.

Research output: Contribution to journalArticle

McNamara, D, Louwerse, MM, McCarthy, PM & Graesser, AC 2010, 'Coh-Metrix: Capturing Linguistic Features of Cohesion', Discourse Processes, vol. 47, no. 4, pp. 292-330. https://doi.org/10.1080/01638530902959943
McNamara, Danielle ; Louwerse, Max M. ; McCarthy, Philip M. ; Graesser, Arthur C. / Coh-Metrix : Capturing Linguistic Features of Cohesion. In: Discourse Processes. 2010 ; Vol. 47, No. 4. pp. 292-330.
@article{76d3e67e65f24bbdb3a6ed3f913a9dce,
title = "Coh-Metrix: Capturing Linguistic Features of Cohesion",
abstract = "This study addresses the need in discourse psychology for computational techniques that analyze text on multiple levels of cohesion and text difficulty. Discourse psychologists often investigate phenomena related to discourse processing using lengthy texts containing multiple paragraphs, as opposed to single word and sentence stimuli. Characterizing such texts in terms of cohesion and coherence is challenging. Some computational tools are available, but they are either fragmented over different databases or they assess single, specific features of text. Coh-Metrix is a computational linguistic tool that measures text cohesion and text difficulty on a range of word, sentence, paragraph, and discourse dimensions. This study investigated the validity of Coh-Metrix as a measure of cohesion in text using stimuli from published discourse psychology studies as a benchmark. Results showed that Coh-Metrix indexes of cohesion (individually and combined) significantly distinguished the high- versus low-cohesion versions of these texts. The results also showed that commonly used readability indexes (e.g., Flesch-Kincaid) inappropriately distinguished between low- and high-cohesion texts. These results provide a validation of Coh-Metrix, thereby paving the way for its use by researchers in cognitive science, discourse processes, and education, as well as for textbook writers, professionals in instructional design, and instructors.",
author = "Danielle McNamara and Louwerse, {Max M.} and McCarthy, {Philip M.} and Graesser, {Arthur C.}",
year = "2010",
month = "5",
doi = "10.1080/01638530902959943",
language = "English (US)",
volume = "47",
pages = "292--330",
journal = "Discourse Processes",
issn = "0163-853X",
publisher = "Routledge",
number = "4",

}

TY - JOUR

T1 - Coh-Metrix

T2 - Capturing Linguistic Features of Cohesion

AU - McNamara, Danielle

AU - Louwerse, Max M.

AU - McCarthy, Philip M.

AU - Graesser, Arthur C.

PY - 2010/5

Y1 - 2010/5

N2 - This study addresses the need in discourse psychology for computational techniques that analyze text on multiple levels of cohesion and text difficulty. Discourse psychologists often investigate phenomena related to discourse processing using lengthy texts containing multiple paragraphs, as opposed to single word and sentence stimuli. Characterizing such texts in terms of cohesion and coherence is challenging. Some computational tools are available, but they are either fragmented over different databases or they assess single, specific features of text. Coh-Metrix is a computational linguistic tool that measures text cohesion and text difficulty on a range of word, sentence, paragraph, and discourse dimensions. This study investigated the validity of Coh-Metrix as a measure of cohesion in text using stimuli from published discourse psychology studies as a benchmark. Results showed that Coh-Metrix indexes of cohesion (individually and combined) significantly distinguished the high- versus low-cohesion versions of these texts. The results also showed that commonly used readability indexes (e.g., Flesch-Kincaid) inappropriately distinguished between low- and high-cohesion texts. These results provide a validation of Coh-Metrix, thereby paving the way for its use by researchers in cognitive science, discourse processes, and education, as well as for textbook writers, professionals in instructional design, and instructors.

AB - This study addresses the need in discourse psychology for computational techniques that analyze text on multiple levels of cohesion and text difficulty. Discourse psychologists often investigate phenomena related to discourse processing using lengthy texts containing multiple paragraphs, as opposed to single word and sentence stimuli. Characterizing such texts in terms of cohesion and coherence is challenging. Some computational tools are available, but they are either fragmented over different databases or they assess single, specific features of text. Coh-Metrix is a computational linguistic tool that measures text cohesion and text difficulty on a range of word, sentence, paragraph, and discourse dimensions. This study investigated the validity of Coh-Metrix as a measure of cohesion in text using stimuli from published discourse psychology studies as a benchmark. Results showed that Coh-Metrix indexes of cohesion (individually and combined) significantly distinguished the high- versus low-cohesion versions of these texts. The results also showed that commonly used readability indexes (e.g., Flesch-Kincaid) inappropriately distinguished between low- and high-cohesion texts. These results provide a validation of Coh-Metrix, thereby paving the way for its use by researchers in cognitive science, discourse processes, and education, as well as for textbook writers, professionals in instructional design, and instructors.

UR - http://www.scopus.com/inward/record.url?scp=79952885435&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952885435&partnerID=8YFLogxK

U2 - 10.1080/01638530902959943

DO - 10.1080/01638530902959943

M3 - Article

AN - SCOPUS:79952885435

VL - 47

SP - 292

EP - 330

JO - Discourse Processes

JF - Discourse Processes

SN - 0163-853X

IS - 4

ER -