Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications

Scott A. Crossley; Tom Cobb; Danielle McNamara

doi:10.1016/j.system.2013.08.002

Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications

Scott A. Crossley, Tom Cobb, Danielle McNamara

Research output: Contribution to journal › Article › peer-review

66 Scopus citations

Abstract

In assessments of second language (L2) writing, quality of lexis typically claims more variance than other factors, and the most readily operationalized measure of lexical quality is word frequency. This study compares two methods of automatically assessing word frequency in learner productions. The first method, a band-based method, involves lexical frequency profiling, a procedure that first groups individual words into families and then sorts these into corpus-based frequency bands. The second method, a count-based method, assigns a normalized corpus frequency count to each individual word form used, yielding an average count for a text. Both band and count-based methods were used to analyze 100 L2 learner and 30 native speaker freewrites that had been classified according to proficiency level (i.e., native speakers and beginning, intermediate and advanced L2 learners). Machine learning algorithms were used to classify the texts into their respective proficiency levels with results indicating that count-based word frequency indices accurately classified 58% of the texts while band-based indices reported accuracies that were between 10% and 22% lower than count-based indices.

Original language	English (US)
Pages (from-to)	965-981
Number of pages	17
Journal	System
Volume	41
Issue number	4
DOIs	https://doi.org/10.1016/j.system.2013.08.002
State	Published - Dec 2013

Keywords

Active and passive lexical proficiency
Band-based frequency measures
Computational linguistics
Count-based frequency measures
Frequency analysis
Frequency lists
Learner corpora
Lexical sophistication

ASJC Scopus subject areas

Language and Linguistics
Education
Linguistics and Language

Access to Document

10.1016/j.system.2013.08.002

Cite this

@article{4e90d09f9d884bcdbd4998ab3b10daf0,

title = "Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications",

abstract = "In assessments of second language (L2) writing, quality of lexis typically claims more variance than other factors, and the most readily operationalized measure of lexical quality is word frequency. This study compares two methods of automatically assessing word frequency in learner productions. The first method, a band-based method, involves lexical frequency profiling, a procedure that first groups individual words into families and then sorts these into corpus-based frequency bands. The second method, a count-based method, assigns a normalized corpus frequency count to each individual word form used, yielding an average count for a text. Both band and count-based methods were used to analyze 100 L2 learner and 30 native speaker freewrites that had been classified according to proficiency level (i.e., native speakers and beginning, intermediate and advanced L2 learners). Machine learning algorithms were used to classify the texts into their respective proficiency levels with results indicating that count-based word frequency indices accurately classified 58% of the texts while band-based indices reported accuracies that were between 10% and 22% lower than count-based indices.",

keywords = "Active and passive lexical proficiency, Band-based frequency measures, Computational linguistics, Count-based frequency measures, Frequency analysis, Frequency lists, Learner corpora, Lexical sophistication",

author = "Crossley, {Scott A.} and Tom Cobb and Danielle McNamara",

note = "Funding Information: This research was supported in part by the Institute for Education Sciences ( IES R305A080589 and IES R305G20018-02 ). Ideas expressed in this material are those of the authors and do not necessarily reflect the views of the IES. The authors would also like to thank the anonymous reviewers and the editors and staff of System for their support. Lastly, the authors would like to thank Scott Jarvis and Michael Daller for inviting them to the colloquium The validity of vocabulary measures at the 2011 American Association for Applied Linguistics conference from which the ideas in this paper derive. ",

year = "2013",

month = dec,

doi = "10.1016/j.system.2013.08.002",

language = "English (US)",

volume = "41",

pages = "965--981",

journal = "System",

issn = "0346-251X",

publisher = "Elsevier Limited",

number = "4",

}

TY - JOUR

T1 - Comparing count-based and band-based indices of word frequency

T2 - Implications for active vocabulary research and pedagogical applications

AU - Crossley, Scott A.

AU - Cobb, Tom

AU - McNamara, Danielle

N1 - Funding Information: This research was supported in part by the Institute for Education Sciences ( IES R305A080589 and IES R305G20018-02 ). Ideas expressed in this material are those of the authors and do not necessarily reflect the views of the IES. The authors would also like to thank the anonymous reviewers and the editors and staff of System for their support. Lastly, the authors would like to thank Scott Jarvis and Michael Daller for inviting them to the colloquium The validity of vocabulary measures at the 2011 American Association for Applied Linguistics conference from which the ideas in this paper derive.

PY - 2013/12

Y1 - 2013/12

N2 - In assessments of second language (L2) writing, quality of lexis typically claims more variance than other factors, and the most readily operationalized measure of lexical quality is word frequency. This study compares two methods of automatically assessing word frequency in learner productions. The first method, a band-based method, involves lexical frequency profiling, a procedure that first groups individual words into families and then sorts these into corpus-based frequency bands. The second method, a count-based method, assigns a normalized corpus frequency count to each individual word form used, yielding an average count for a text. Both band and count-based methods were used to analyze 100 L2 learner and 30 native speaker freewrites that had been classified according to proficiency level (i.e., native speakers and beginning, intermediate and advanced L2 learners). Machine learning algorithms were used to classify the texts into their respective proficiency levels with results indicating that count-based word frequency indices accurately classified 58% of the texts while band-based indices reported accuracies that were between 10% and 22% lower than count-based indices.

AB - In assessments of second language (L2) writing, quality of lexis typically claims more variance than other factors, and the most readily operationalized measure of lexical quality is word frequency. This study compares two methods of automatically assessing word frequency in learner productions. The first method, a band-based method, involves lexical frequency profiling, a procedure that first groups individual words into families and then sorts these into corpus-based frequency bands. The second method, a count-based method, assigns a normalized corpus frequency count to each individual word form used, yielding an average count for a text. Both band and count-based methods were used to analyze 100 L2 learner and 30 native speaker freewrites that had been classified according to proficiency level (i.e., native speakers and beginning, intermediate and advanced L2 learners). Machine learning algorithms were used to classify the texts into their respective proficiency levels with results indicating that count-based word frequency indices accurately classified 58% of the texts while band-based indices reported accuracies that were between 10% and 22% lower than count-based indices.

KW - Active and passive lexical proficiency

KW - Band-based frequency measures

KW - Computational linguistics

KW - Count-based frequency measures

KW - Frequency analysis

KW - Frequency lists

KW - Learner corpora

KW - Lexical sophistication

UR - http://www.scopus.com/inward/record.url?scp=84886793009&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886793009&partnerID=8YFLogxK

U2 - 10.1016/j.system.2013.08.002

DO - 10.1016/j.system.2013.08.002

M3 - Article

AN - SCOPUS:84886793009

SN - 0346-251X

VL - 41

SP - 965

EP - 981

JO - System

JF - System

IS - 4

ER -

Comparing count-based and band-based indices of word frequency: Implications for active vocabulary research and pedagogical applications

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this