Detecting the First Language of Second Language Writers Using Automated Indices of Cohesion, Lexical Sophistication, Syntactic Complexity and Conceptual Knowledge

Scott Jarvis, Danielle S. McNamara

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations

Abstract

Can we reliably predict the first language (L1) of a text written by a second language (L2) writer? Do aspects of the L2 writer’s L1 leak through sufficiently such that it can be detected? As discussed in the first chapter of this volume, intuition lends to the assumption that experienced L2 teachers would be able to predict, with some accuracy, the L1 of an L2 writer. Indeed, if particular linguistic features and rhetorical strategies transfer from an L1 to an L2 in predictable ways, the language may be even more apparent to a teacher who shares the same L1 with the student writer. Similarly, if the linguistic features and rhetorical strategies transfer reliably and with sufficient frequency, crosslinguistic influences should be detectable by computational linguistic indices. With this notion in mind, the objective of this study is to examine the degree to which the L1 of an L2 writer can be detected using computational linguistic indices. We use the computational tool, Coh-Metrix (McNamara & Graesser, in press), to identify linguistic differences in a corpus of L2 essays written in English by a variety of L1 speakers (Czech, Finnish, German and Spanish). We then develop a statistical model to classify the language background of the writer. Such an approach not only allows us to detect the linguistic patterns associated with specific language backgrounds, but also allows us to test the strength of these differences in classifying texts into specific language groups. Because a statistical model would allow us to predict L1 background with a degree of accuracy that potentially exceeds chance, the success of these models in turn would provide support for theoretical arguments related to crosslinguistic influences (Jarvis, 2010).

Original languageEnglish (US)
Title of host publicationApproaching Language Transfer through Text Classification
Subtitle of host publicationExplorations in the Detection-Based Approach
PublisherChannel View Publications
Pages106-126
Number of pages21
ISBN (Electronic)9781847696991
ISBN (Print)9781847696977
StatePublished - Mar 14 2012

ASJC Scopus subject areas

  • General Arts and Humanities
  • General Social Sciences

Fingerprint

Dive into the research topics of 'Detecting the First Language of Second Language Writers Using Automated Indices of Cohesion, Lexical Sophistication, Syntactic Complexity and Conceptual Knowledge'. Together they form a unique fingerprint.

Cite this