Abstract
In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density languages, drawing on well-established psycholinguistic factors. Using the low-density language Maltese as a test case, we highlight the challenges that face researchers developing resources for languages with sparsely available data and identify a key empirical link between corpus and psycholinguistic research as a tool to evaluate corpus resources. Specifically, we compare two robust variables identified in the psycholinguistic literature: word frequency (as measured in a corpus) and word familiarity (as measured in a rating task). We then use three statistical methods to evaluate these comparisons. This research provides a multidisciplinary approach to corpus development and evaluation, in particular for less-resourced languages that lack a wide access to diverse language data.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010 |
Editors | Daniel Tapias, Irene Russo, Olivier Hamon, Stelios Piperidis, Nicoletta Calzolari, Khalid Choukri, Joseph Mariani, Helene Mazo, Bente Maegaard, Jan Odijk, Mike Rosner |
Publisher | European Language Resources Association (ELRA) |
Pages | 421-427 |
Number of pages | 7 |
ISBN (Electronic) | 2951740867, 9782951740860 |
State | Published - Jan 1 2010 |
Event | 7th International Conference on Language Resources and Evaluation, LREC 2010 - Valletta, Malta Duration: May 17 2010 → May 23 2010 |
Other
Other | 7th International Conference on Language Resources and Evaluation, LREC 2010 |
---|---|
Country/Territory | Malta |
City | Valletta |
Period | 5/17/10 → 5/23/10 |
ASJC Scopus subject areas
- Education
- Library and Information Sciences
- Linguistics and Language
- Language and Linguistics