Predicting lexical proficiency in language learner texts using computational indices

Scott A. Crossley, Tom Salsbury, Danielle S. McNamara, Scott Jarvis

Research output: Contribution to journalArticlepeer-review

101 Scopus citations


The authors present a model of lexical proficiency based on lexical indices related to vocabulary size, depth of lexical knowledge, and accessibility to core lexical items. The lexical indices used in this study come from the computational tool Coh-Metrix and include word length scores, lexical diversity values, word frequency counts, hypernymy values, polysemy values, semantic co-referentiality, word meaningfulness, word concreteness, word imagability, and word familiarity. Human raters evaluated a corpus of 240 written texts using a standardized rubric of lexical proficiency. To ensure a variety of text levels, the corpus comprised 60 texts each from beginning, intermediate, and advanced second language (L2) adult English learners. The L2 texts were collected longitudinally from 10 English learners. In addition, 60 texts from native English speakers were collected. The holistic scores from the trained human raters were then correlated to a variety of lexical indices. The researchers found that lexical diversity, word hypernymy values and content word frequency explain 44% of the variance of the human evaluations of lexical proficiency in the examined writing samples. The findings represent an important step in the development of a model of lexical proficiency that incorporates both vocabulary size and depth of lexical knowledge features.

Original languageEnglish (US)
Pages (from-to)561-580
Number of pages20
JournalLanguage Testing
Issue number4
StatePublished - Oct 2011
Externally publishedYes


  • computational linguistics
  • corpus linguistics
  • depth of lexical knowledge
  • hypernymy
  • lexical diversity
  • lexical frequency
  • lexical proficiency
  • vocabulary size

ASJC Scopus subject areas

  • Language and Linguistics
  • Social Sciences (miscellaneous)
  • Linguistics and Language


Dive into the research topics of 'Predicting lexical proficiency in language learner texts using computational indices'. Together they form a unique fingerprint.

Cite this