Age of Exposure 2.0: Estimating word complexity using iterative models of word embeddings

Robert Mihai Botarleanu, Mihai Dascalu, Micah Watanabe, Scott Andrew Crossley, Danielle S. McNamara

Research output: Contribution to journalArticlepeer-review

Abstract

Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase the cost and effort to produce. In this paper, we introduce Age of Exposure (AoE) version 2, a proxy for human exposure to new vocabulary terms that expands AoA word lists through training regressors to predict AoA scores. Word2vec word embeddings are trained on cumulatively increasing corpora of texts, word exposure trajectories are generated by aligning the word2vec vector spaces, and features of words are derived for modeling AoA scores. Our prediction models achieve low errors (from 13% with a corresponding R2 of.35 up to 7% with an R2 of.74), can be uniformly applied to different AoA word lists, and generalize to the entire vocabulary of a language. Our method benefits from using existing readability indices to define the order of texts in the corpora, while the performed analyses confirm that the generated AoA scores accurately predicted the difficulty of texts (R2 of.84, surpassing related previous work). Further, we provide evidence of the internal reliability of our word trajectory features, demonstrate the effectiveness of the word trajectory features when contrasted with simple lexical features, and show that the exclusion of features that rely on external resources does not significantly impact performance.

Original languageEnglish (US)
JournalBehavior Research Methods
DOIs
StateAccepted/In press - 2022

Keywords

  • Age of acquisition
  • Age of exposure
  • Word embeddings
  • Word exposure

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Psychology (miscellaneous)
  • Psychology(all)

Fingerprint

Dive into the research topics of 'Age of Exposure 2.0: Estimating word complexity using iterative models of word embeddings'. Together they form a unique fingerprint.

Cite this