Predicting lexical proficiency in language learner texts using computational indices

Scott A. Crossley; Tom Salsbury; Danielle S. McNamara; Scott Jarvis

doi:10.1177/0265532210378031

Predicting lexical proficiency in language learner texts using computational indices

Scott A. Crossley, Tom Salsbury, Danielle S. McNamara, Scott Jarvis

Research output: Contribution to journal › Article › peer-review

121 Scopus citations

Abstract

The authors present a model of lexical proficiency based on lexical indices related to vocabulary size, depth of lexical knowledge, and accessibility to core lexical items. The lexical indices used in this study come from the computational tool Coh-Metrix and include word length scores, lexical diversity values, word frequency counts, hypernymy values, polysemy values, semantic co-referentiality, word meaningfulness, word concreteness, word imagability, and word familiarity. Human raters evaluated a corpus of 240 written texts using a standardized rubric of lexical proficiency. To ensure a variety of text levels, the corpus comprised 60 texts each from beginning, intermediate, and advanced second language (L2) adult English learners. The L2 texts were collected longitudinally from 10 English learners. In addition, 60 texts from native English speakers were collected. The holistic scores from the trained human raters were then correlated to a variety of lexical indices. The researchers found that lexical diversity, word hypernymy values and content word frequency explain 44% of the variance of the human evaluations of lexical proficiency in the examined writing samples. The findings represent an important step in the development of a model of lexical proficiency that incorporates both vocabulary size and depth of lexical knowledge features.

Original language	English (US)
Pages (from-to)	561-580
Number of pages	20
Journal	Language Testing
Volume	28
Issue number	4
DOIs	https://doi.org/10.1177/0265532210378031
State	Published - Oct 2011
Externally published	Yes

Keywords

computational linguistics
corpus linguistics
depth of lexical knowledge
hypernymy
lexical diversity
lexical frequency
lexical proficiency
vocabulary size

ASJC Scopus subject areas

Language and Linguistics
Social Sciences (miscellaneous)
Linguistics and Language

Access to Document

10.1177/0265532210378031

Cite this

@article{98166a2e9d2045ebb8e10da02cc52020,

title = "Predicting lexical proficiency in language learner texts using computational indices",

abstract = "The authors present a model of lexical proficiency based on lexical indices related to vocabulary size, depth of lexical knowledge, and accessibility to core lexical items. The lexical indices used in this study come from the computational tool Coh-Metrix and include word length scores, lexical diversity values, word frequency counts, hypernymy values, polysemy values, semantic co-referentiality, word meaningfulness, word concreteness, word imagability, and word familiarity. Human raters evaluated a corpus of 240 written texts using a standardized rubric of lexical proficiency. To ensure a variety of text levels, the corpus comprised 60 texts each from beginning, intermediate, and advanced second language (L2) adult English learners. The L2 texts were collected longitudinally from 10 English learners. In addition, 60 texts from native English speakers were collected. The holistic scores from the trained human raters were then correlated to a variety of lexical indices. The researchers found that lexical diversity, word hypernymy values and content word frequency explain 44% of the variance of the human evaluations of lexical proficiency in the examined writing samples. The findings represent an important step in the development of a model of lexical proficiency that incorporates both vocabulary size and depth of lexical knowledge features.",

keywords = "computational linguistics, corpus linguistics, depth of lexical knowledge, hypernymy, lexical diversity, lexical frequency, lexical proficiency, vocabulary size",

author = "Crossley, {Scott A.} and Tom Salsbury and McNamara, {Danielle S.} and Scott Jarvis",

note = "Funding Information: This research was supported in part by the Institute for Education Sciences (IES R305A080589 and IES R305G20018-02). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the IES. We also give special thanks to our human raters: Brandi Williams, Jessica Mann, and Abigail Voller. We would also like to acknowledge the assistance of Dr. Phillip McCarthy in helping validate the scoring rubric used in this study.",

year = "2011",

month = oct,

doi = "10.1177/0265532210378031",

language = "English (US)",

volume = "28",

pages = "561--580",

journal = "Language Testing",

issn = "0265-5322",

publisher = "SAGE Publications Ltd",

number = "4",

}

TY - JOUR

T1 - Predicting lexical proficiency in language learner texts using computational indices

AU - Crossley, Scott A.

AU - Salsbury, Tom

AU - McNamara, Danielle S.

AU - Jarvis, Scott

N1 - Funding Information: This research was supported in part by the Institute for Education Sciences (IES R305A080589 and IES R305G20018-02). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the IES. We also give special thanks to our human raters: Brandi Williams, Jessica Mann, and Abigail Voller. We would also like to acknowledge the assistance of Dr. Phillip McCarthy in helping validate the scoring rubric used in this study.

PY - 2011/10

Y1 - 2011/10

N2 - The authors present a model of lexical proficiency based on lexical indices related to vocabulary size, depth of lexical knowledge, and accessibility to core lexical items. The lexical indices used in this study come from the computational tool Coh-Metrix and include word length scores, lexical diversity values, word frequency counts, hypernymy values, polysemy values, semantic co-referentiality, word meaningfulness, word concreteness, word imagability, and word familiarity. Human raters evaluated a corpus of 240 written texts using a standardized rubric of lexical proficiency. To ensure a variety of text levels, the corpus comprised 60 texts each from beginning, intermediate, and advanced second language (L2) adult English learners. The L2 texts were collected longitudinally from 10 English learners. In addition, 60 texts from native English speakers were collected. The holistic scores from the trained human raters were then correlated to a variety of lexical indices. The researchers found that lexical diversity, word hypernymy values and content word frequency explain 44% of the variance of the human evaluations of lexical proficiency in the examined writing samples. The findings represent an important step in the development of a model of lexical proficiency that incorporates both vocabulary size and depth of lexical knowledge features.

AB - The authors present a model of lexical proficiency based on lexical indices related to vocabulary size, depth of lexical knowledge, and accessibility to core lexical items. The lexical indices used in this study come from the computational tool Coh-Metrix and include word length scores, lexical diversity values, word frequency counts, hypernymy values, polysemy values, semantic co-referentiality, word meaningfulness, word concreteness, word imagability, and word familiarity. Human raters evaluated a corpus of 240 written texts using a standardized rubric of lexical proficiency. To ensure a variety of text levels, the corpus comprised 60 texts each from beginning, intermediate, and advanced second language (L2) adult English learners. The L2 texts were collected longitudinally from 10 English learners. In addition, 60 texts from native English speakers were collected. The holistic scores from the trained human raters were then correlated to a variety of lexical indices. The researchers found that lexical diversity, word hypernymy values and content word frequency explain 44% of the variance of the human evaluations of lexical proficiency in the examined writing samples. The findings represent an important step in the development of a model of lexical proficiency that incorporates both vocabulary size and depth of lexical knowledge features.

KW - computational linguistics

KW - corpus linguistics

KW - depth of lexical knowledge

KW - hypernymy

KW - lexical diversity

KW - lexical frequency

KW - lexical proficiency

KW - vocabulary size

UR - http://www.scopus.com/inward/record.url?scp=79952752042&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952752042&partnerID=8YFLogxK

U2 - 10.1177/0265532210378031

DO - 10.1177/0265532210378031

M3 - Article

AN - SCOPUS:79952752042

SN - 0265-5322

VL - 28

SP - 561

EP - 580

JO - Language Testing

JF - Language Testing

IS - 4

ER -

Predicting lexical proficiency in language learner texts using computational indices

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this