The Comparative and Combined Contributions of n-Grams, Coh-Metrix Indices and Error Types in the L1 Classifi cation of Learner Texts

Scott Jarvis, Yves Bestgen, Scott A. Crossley, Sylviane Granger, Magali Paquot, Jennifer Thewissen, Danielle McNamara

Research output: Chapter in Book/Report/Conference proceedingChapter

5 Scopus citations

Abstract

Chapters 3 through 5 of this book have given an indication of the levels of L1 detection accuracy that can be attained through classification analyses whose predictor variables are individual words and multiword sequences (or n-grams, see Chapter 3), measures of coherence, lexical semantics and lexical diversity (or Coh-Metrix (CM) indices, see Chapter 4 and McNamara & Graesser, in press), and the types and numbers of errors that learners make in their L2 English writing (see Chapter 5). The results of these analyses show L1 classification accuracies from roughly 54% for n-grams to roughly 65% for both errors and CM indices. All three analyses were performed with data extracted from the International Corpus of Learner English (ICLE; see Granger et al., 2009) using similar selection criteria (e.g. argumentative essays between 500 and 1000 words in length), but they differ in relation to the number of texts analyzed (2033 in the n-gram analysis, 903 in the CM analysis, and 223 in the error analysis) as well as in relation to the number of L1s under investigation (12, 4 and 3, respectively). The purpose of the present chapter is to perform a series of L1 detection analyses on essays from three language groups (French, German and Spanish), applying the features (or variables) from all three studies to a single dataset in order to examine both the comparative and combined usefulness of n-grams, CM indices and error measures for this type of research.

Original languageEnglish (US)
Title of host publicationApproaching Language Transfer through Text Classification
Subtitle of host publicationExplorations in the Detection-Based Approach
PublisherChannel View Publications
Pages154-177
Number of pages24
ISBN (Electronic)9781847696991
ISBN (Print)9781847696977
StatePublished - Mar 14 2012

ASJC Scopus subject areas

  • General Arts and Humanities
  • General Social Sciences

Fingerprint

Dive into the research topics of 'The Comparative and Combined Contributions of n-Grams, Coh-Metrix Indices and Error Types in the L1 Classifi cation of Learner Texts'. Together they form a unique fingerprint.

Cite this