Predicting the proficiency level of language learners using lexical indices

Scott A. Crossley, Tom Salsbury, Danielle McNamara

Research output: Contribution to journalArticle

35 Citations (Scopus)

Abstract

This study explores how second language (L2) texts written by learners at various proficiency levels can be classified using computational indices that characterize lexical competence. For this study, 100 writing samples taken from 100 L2 learners were analyzed using lexical indices reported by the computational tool Coh-Metrix. The L2 writing samples were categorized into beginning, intermediate, and advanced groupings based on the TOEFL and ACT ESL Compass scores of the writer. A discriminant function analysis was used to predict the level categorization of the texts using lexical indices related to breadth of lexical knowledge (word frequency, lexical diversity), depth of lexical knowledge (hypernymy, polysemy, semantic co-referentiality, and word meaningfulness), and access to core lexical items (word concreteness, familiarity, and imagability). The strongest predictors of an individual's proficiency level were word agability, word frequency, lexical diversity, and word familiarity. In total, the indices correctly classified 70% of the texts based on proficiency level in both a training and a test set. The authors argue for the applicability of a statistical model as a method to investigate lexical competence across language levels, as a method to assess L2 lexical development, and as a method to classify L2 proficiency.

Original languageEnglish (US)
Pages (from-to)243-263
Number of pages21
JournalLanguage Testing
Volume29
Issue number2
DOIs
StatePublished - Apr 2012
Externally publishedYes

Fingerprint

language
grouping
writer
semantics
Proficiency
Language
Computational
Familiarity
Word Frequency
Lexical Knowledge

Keywords

  • frequency
  • language proficiency
  • lexical competence
  • lexical diversity
  • second language acquisition
  • word familiarity
  • word imagability

ASJC Scopus subject areas

  • Linguistics and Language
  • Social Sciences (miscellaneous)
  • Language and Linguistics

Cite this

Predicting the proficiency level of language learners using lexical indices. / Crossley, Scott A.; Salsbury, Tom; McNamara, Danielle.

In: Language Testing, Vol. 29, No. 2, 04.2012, p. 243-263.

Research output: Contribution to journalArticle

Crossley, Scott A. ; Salsbury, Tom ; McNamara, Danielle. / Predicting the proficiency level of language learners using lexical indices. In: Language Testing. 2012 ; Vol. 29, No. 2. pp. 243-263.
@article{3e628c0609b04fe2a7b444fc0c81b9c2,
title = "Predicting the proficiency level of language learners using lexical indices",
abstract = "This study explores how second language (L2) texts written by learners at various proficiency levels can be classified using computational indices that characterize lexical competence. For this study, 100 writing samples taken from 100 L2 learners were analyzed using lexical indices reported by the computational tool Coh-Metrix. The L2 writing samples were categorized into beginning, intermediate, and advanced groupings based on the TOEFL and ACT ESL Compass scores of the writer. A discriminant function analysis was used to predict the level categorization of the texts using lexical indices related to breadth of lexical knowledge (word frequency, lexical diversity), depth of lexical knowledge (hypernymy, polysemy, semantic co-referentiality, and word meaningfulness), and access to core lexical items (word concreteness, familiarity, and imagability). The strongest predictors of an individual's proficiency level were word agability, word frequency, lexical diversity, and word familiarity. In total, the indices correctly classified 70{\%} of the texts based on proficiency level in both a training and a test set. The authors argue for the applicability of a statistical model as a method to investigate lexical competence across language levels, as a method to assess L2 lexical development, and as a method to classify L2 proficiency.",
keywords = "frequency, language proficiency, lexical competence, lexical diversity, second language acquisition, word familiarity, word imagability",
author = "Crossley, {Scott A.} and Tom Salsbury and Danielle McNamara",
year = "2012",
month = "4",
doi = "10.1177/0265532211419331",
language = "English (US)",
volume = "29",
pages = "243--263",
journal = "Language Testing",
issn = "0265-5322",
publisher = "SAGE Publications Ltd",
number = "2",

}

TY - JOUR

T1 - Predicting the proficiency level of language learners using lexical indices

AU - Crossley, Scott A.

AU - Salsbury, Tom

AU - McNamara, Danielle

PY - 2012/4

Y1 - 2012/4

N2 - This study explores how second language (L2) texts written by learners at various proficiency levels can be classified using computational indices that characterize lexical competence. For this study, 100 writing samples taken from 100 L2 learners were analyzed using lexical indices reported by the computational tool Coh-Metrix. The L2 writing samples were categorized into beginning, intermediate, and advanced groupings based on the TOEFL and ACT ESL Compass scores of the writer. A discriminant function analysis was used to predict the level categorization of the texts using lexical indices related to breadth of lexical knowledge (word frequency, lexical diversity), depth of lexical knowledge (hypernymy, polysemy, semantic co-referentiality, and word meaningfulness), and access to core lexical items (word concreteness, familiarity, and imagability). The strongest predictors of an individual's proficiency level were word agability, word frequency, lexical diversity, and word familiarity. In total, the indices correctly classified 70% of the texts based on proficiency level in both a training and a test set. The authors argue for the applicability of a statistical model as a method to investigate lexical competence across language levels, as a method to assess L2 lexical development, and as a method to classify L2 proficiency.

AB - This study explores how second language (L2) texts written by learners at various proficiency levels can be classified using computational indices that characterize lexical competence. For this study, 100 writing samples taken from 100 L2 learners were analyzed using lexical indices reported by the computational tool Coh-Metrix. The L2 writing samples were categorized into beginning, intermediate, and advanced groupings based on the TOEFL and ACT ESL Compass scores of the writer. A discriminant function analysis was used to predict the level categorization of the texts using lexical indices related to breadth of lexical knowledge (word frequency, lexical diversity), depth of lexical knowledge (hypernymy, polysemy, semantic co-referentiality, and word meaningfulness), and access to core lexical items (word concreteness, familiarity, and imagability). The strongest predictors of an individual's proficiency level were word agability, word frequency, lexical diversity, and word familiarity. In total, the indices correctly classified 70% of the texts based on proficiency level in both a training and a test set. The authors argue for the applicability of a statistical model as a method to investigate lexical competence across language levels, as a method to assess L2 lexical development, and as a method to classify L2 proficiency.

KW - frequency

KW - language proficiency

KW - lexical competence

KW - lexical diversity

KW - second language acquisition

KW - word familiarity

KW - word imagability

UR - http://www.scopus.com/inward/record.url?scp=84860161784&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860161784&partnerID=8YFLogxK

U2 - 10.1177/0265532211419331

DO - 10.1177/0265532211419331

M3 - Article

VL - 29

SP - 243

EP - 263

JO - Language Testing

JF - Language Testing

SN - 0265-5322

IS - 2

ER -