Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications

Scott A. Crossley, Tom Cobb, Danielle McNamara

Research output: Contribution to journalArticlepeer-review

66 Scopus citations

Abstract

In assessments of second language (L2) writing, quality of lexis typically claims more variance than other factors, and the most readily operationalized measure of lexical quality is word frequency. This study compares two methods of automatically assessing word frequency in learner productions. The first method, a band-based method, involves lexical frequency profiling, a procedure that first groups individual words into families and then sorts these into corpus-based frequency bands. The second method, a count-based method, assigns a normalized corpus frequency count to each individual word form used, yielding an average count for a text. Both band and count-based methods were used to analyze 100 L2 learner and 30 native speaker freewrites that had been classified according to proficiency level (i.e., native speakers and beginning, intermediate and advanced L2 learners). Machine learning algorithms were used to classify the texts into their respective proficiency levels with results indicating that count-based word frequency indices accurately classified 58% of the texts while band-based indices reported accuracies that were between 10% and 22% lower than count-based indices.

Original languageEnglish (US)
Pages (from-to)965-981
Number of pages17
JournalSystem
Volume41
Issue number4
DOIs
StatePublished - Dec 2013

Keywords

  • Active and passive lexical proficiency
  • Band-based frequency measures
  • Computational linguistics
  • Count-based frequency measures
  • Frequency analysis
  • Frequency lists
  • Learner corpora
  • Lexical sophistication

ASJC Scopus subject areas

  • Language and Linguistics
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications'. Together they form a unique fingerprint.

Cite this